to have an understanding of foundations of design of ... - … · web viewthe term...
TRANSCRIPT
SP NOTES
Synergy Institute of Engineering amp TechnologyDhenkanal
Biju Patnaik University of Technology Orissa
Nirjharinee Parida
Lecture notes
On
SYSTEM PROGRAMMING
Subject codePCCS4202
Semester4th
Branch Computer science amp engineering
Course title System Programming
Lectures 32hrs per semester
Lecture planned 47 classes per semester
Course aim
To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
Course description
System software Machine structureMachine Language assembly language program assembler macro-processor Loader Linker Programming language Formal systems Compiler
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 1
SP NOTES
CLASS-WISE NOTES ON SYSTEM PROGRAMMING
Lecture_no1
PREREQUISITES
One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
AIM
To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
OBJECTIVES
To understand the relationship between system software and machine architecture To know the design and implementation of Assemblers To know the design and implementation of Linkers and Loaders To have an understanding of macroprocessors To know the function of Compiler
EXPECTED OUTCOME
This course provides knowledge to design various system programs
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 2
SP NOTES
MODULE-1
Lecture_no2 Introduction to software its types machine structure
bullDefinition of software its types example its application
bullWhat is Programming
bullWhat is system programming
Ans System programming is a branch of computer science and engineering that deals with the structure of the machine The structure of a machine may compose of memory IO processor CPU (Central Processing Unit) card reader and printer etc In other words system programming is one of high level procedural language knowledge of data structure and computer organization
bullDifferentiate between system software and application software
bull Define computer architecture
bull Define computer organization
bullGeneral hardware organization of a computer system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 3
SP NOTES
Lecture_no3 Evolution of components of system programming
Assembler
An assembler translates assembly language into machine code Assembly language consists of mnemonics for machine opcodes so assemblers perform a 11 translation from mnemonic to a direct instruction For example
LDA 4 converts to 0001001000100100
Conversely one instruction in a high level language will translate to one or more instructions at machine level
Advantages of using an Assembler
Very fast in translating assembly language to machine code as 1 to 1 relationshipAssembly code is often very efficient (and therefore fast) because it is a low level languageAssembly code is fairly easy to understand due to the use of English-like mnemonics
Disadvantages of using Assembler
Assembly language is written for a certain instruction set andor processorAssembly tends to be optimised for the hardware its designed for meaning it is often incompatible with
different hardwareLots of assembly code is needed to do relatively simple tasks and complex programs require lots of
programming time
Compiler
A Compiler is a computer program that translates code written in a high level language to a lower level language objectmachine code The most common reason for translating source code is to create an executable program (converting from a high level language into machine language)
Advantages of using a compiler
Source code is not included therefore compiled code is more secure than interpreted codeTends to produce faster code than interpreting source codeProduces an executable file and therefore the program can be run without need of the source code
Disadvantages of using a compiler
Object code needs to be produced before a final executable file this can be a slow processThe source code must be 100 correct for the executable file to be produced
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 4
SP NOTES
Interpreter
An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone
Advantages of using an Interpreter
Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same
codeUseful for prototyping software and testing basic program logic
Disadvantages of using an Interpreter
Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method
1)Name the three types of program translator and describe what level of language each translates
Answer
Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level
2) Why might you want to use a compiler instead of an interpreter
A compiler makes faster more secure code
3) Why might you want to use an interpreter instead of a compiler
Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code
Loader
A Loader is system program that place the object program into main memory and prepares it for execution
bull Basic functions of loader
ndash Allocation
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5
SP NOTES
ndash Linking
ndash Relocation
ndash Loading
Types of Loader-
bull Compile-and-go Loader
bull Relocating Loader
bull Direct Linking Loader
bull Absolute Loader
bull General Loader
bull Dynamic Loader
Macro amp Macro processor
Macro ndash Macro is a single line abbreviation for a group of instruction
MACRO -------- Start of definition
INCR -------- Macro name
A 1DATA
A 2DATA Sequence of instructions to be abbreviated
A 3DATA
MEND -------- End of definition
Linking and Linker
bull Linking ndash The Process of merging many object modules to form a single object program is called as linking
bull Linker bull The Linker is the software program which binds many object modules to make a single object program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6
SP NOTES
Lecture_no4 Foundation of system programming
1) Formal Systembull A formal system is an un interpreted calculus
It consists of
ndash Alphabets
ndash A set of words called Axioms
ndash Finite set of relations called rules of inference or production rules
ndash Ex Boolean algebra
2) Need Of System Software-The basic need of system software is to achieve the following goals -
bull To achieve efficient performance of the system
bull To make effective execution of general user program
bull To make effective utilization of human resources
bull To make available new better facilities
Need of system programming
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7
SP NOTES
Lecture_no5 Evolution of operating systemGeneral machine structure
OSJobmultiprogrammingMultitasking
FragmentationPagingScheduling
Facilities of computer os
Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware
bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner
Functions of Operating System
bull File handling and management
bull Storage management (Memory management)
bull Device scheduling and management
bull CPU scheduling
bull Information management
bull Process control (management)
bull Error handling
bull Protecting itself from user amp protecting user from other users
What is MFT
What is MVT
What is Fragmentation
What is paging Explain its various types
What is scheduling
What is meant by the term time sharing
What is virtual memoryWhat are simple batched systems Explain
Give an evolution of an operating system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8
SP NOTES
Lecture_no6 Description of Von-Neumann machines components
General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine
1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components
(Design of the VonNeumann architecture)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
CLASS-WISE NOTES ON SYSTEM PROGRAMMING
Lecture_no1
PREREQUISITES
One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
AIM
To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
OBJECTIVES
To understand the relationship between system software and machine architecture To know the design and implementation of Assemblers To know the design and implementation of Linkers and Loaders To have an understanding of macroprocessors To know the function of Compiler
EXPECTED OUTCOME
This course provides knowledge to design various system programs
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 2
SP NOTES
MODULE-1
Lecture_no2 Introduction to software its types machine structure
bullDefinition of software its types example its application
bullWhat is Programming
bullWhat is system programming
Ans System programming is a branch of computer science and engineering that deals with the structure of the machine The structure of a machine may compose of memory IO processor CPU (Central Processing Unit) card reader and printer etc In other words system programming is one of high level procedural language knowledge of data structure and computer organization
bullDifferentiate between system software and application software
bull Define computer architecture
bull Define computer organization
bullGeneral hardware organization of a computer system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 3
SP NOTES
Lecture_no3 Evolution of components of system programming
Assembler
An assembler translates assembly language into machine code Assembly language consists of mnemonics for machine opcodes so assemblers perform a 11 translation from mnemonic to a direct instruction For example
LDA 4 converts to 0001001000100100
Conversely one instruction in a high level language will translate to one or more instructions at machine level
Advantages of using an Assembler
Very fast in translating assembly language to machine code as 1 to 1 relationshipAssembly code is often very efficient (and therefore fast) because it is a low level languageAssembly code is fairly easy to understand due to the use of English-like mnemonics
Disadvantages of using Assembler
Assembly language is written for a certain instruction set andor processorAssembly tends to be optimised for the hardware its designed for meaning it is often incompatible with
different hardwareLots of assembly code is needed to do relatively simple tasks and complex programs require lots of
programming time
Compiler
A Compiler is a computer program that translates code written in a high level language to a lower level language objectmachine code The most common reason for translating source code is to create an executable program (converting from a high level language into machine language)
Advantages of using a compiler
Source code is not included therefore compiled code is more secure than interpreted codeTends to produce faster code than interpreting source codeProduces an executable file and therefore the program can be run without need of the source code
Disadvantages of using a compiler
Object code needs to be produced before a final executable file this can be a slow processThe source code must be 100 correct for the executable file to be produced
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 4
SP NOTES
Interpreter
An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone
Advantages of using an Interpreter
Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same
codeUseful for prototyping software and testing basic program logic
Disadvantages of using an Interpreter
Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method
1)Name the three types of program translator and describe what level of language each translates
Answer
Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level
2) Why might you want to use a compiler instead of an interpreter
A compiler makes faster more secure code
3) Why might you want to use an interpreter instead of a compiler
Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code
Loader
A Loader is system program that place the object program into main memory and prepares it for execution
bull Basic functions of loader
ndash Allocation
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5
SP NOTES
ndash Linking
ndash Relocation
ndash Loading
Types of Loader-
bull Compile-and-go Loader
bull Relocating Loader
bull Direct Linking Loader
bull Absolute Loader
bull General Loader
bull Dynamic Loader
Macro amp Macro processor
Macro ndash Macro is a single line abbreviation for a group of instruction
MACRO -------- Start of definition
INCR -------- Macro name
A 1DATA
A 2DATA Sequence of instructions to be abbreviated
A 3DATA
MEND -------- End of definition
Linking and Linker
bull Linking ndash The Process of merging many object modules to form a single object program is called as linking
bull Linker bull The Linker is the software program which binds many object modules to make a single object program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6
SP NOTES
Lecture_no4 Foundation of system programming
1) Formal Systembull A formal system is an un interpreted calculus
It consists of
ndash Alphabets
ndash A set of words called Axioms
ndash Finite set of relations called rules of inference or production rules
ndash Ex Boolean algebra
2) Need Of System Software-The basic need of system software is to achieve the following goals -
bull To achieve efficient performance of the system
bull To make effective execution of general user program
bull To make effective utilization of human resources
bull To make available new better facilities
Need of system programming
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7
SP NOTES
Lecture_no5 Evolution of operating systemGeneral machine structure
OSJobmultiprogrammingMultitasking
FragmentationPagingScheduling
Facilities of computer os
Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware
bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner
Functions of Operating System
bull File handling and management
bull Storage management (Memory management)
bull Device scheduling and management
bull CPU scheduling
bull Information management
bull Process control (management)
bull Error handling
bull Protecting itself from user amp protecting user from other users
What is MFT
What is MVT
What is Fragmentation
What is paging Explain its various types
What is scheduling
What is meant by the term time sharing
What is virtual memoryWhat are simple batched systems Explain
Give an evolution of an operating system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8
SP NOTES
Lecture_no6 Description of Von-Neumann machines components
General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine
1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components
(Design of the VonNeumann architecture)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
MODULE-1
Lecture_no2 Introduction to software its types machine structure
bullDefinition of software its types example its application
bullWhat is Programming
bullWhat is system programming
Ans System programming is a branch of computer science and engineering that deals with the structure of the machine The structure of a machine may compose of memory IO processor CPU (Central Processing Unit) card reader and printer etc In other words system programming is one of high level procedural language knowledge of data structure and computer organization
bullDifferentiate between system software and application software
bull Define computer architecture
bull Define computer organization
bullGeneral hardware organization of a computer system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 3
SP NOTES
Lecture_no3 Evolution of components of system programming
Assembler
An assembler translates assembly language into machine code Assembly language consists of mnemonics for machine opcodes so assemblers perform a 11 translation from mnemonic to a direct instruction For example
LDA 4 converts to 0001001000100100
Conversely one instruction in a high level language will translate to one or more instructions at machine level
Advantages of using an Assembler
Very fast in translating assembly language to machine code as 1 to 1 relationshipAssembly code is often very efficient (and therefore fast) because it is a low level languageAssembly code is fairly easy to understand due to the use of English-like mnemonics
Disadvantages of using Assembler
Assembly language is written for a certain instruction set andor processorAssembly tends to be optimised for the hardware its designed for meaning it is often incompatible with
different hardwareLots of assembly code is needed to do relatively simple tasks and complex programs require lots of
programming time
Compiler
A Compiler is a computer program that translates code written in a high level language to a lower level language objectmachine code The most common reason for translating source code is to create an executable program (converting from a high level language into machine language)
Advantages of using a compiler
Source code is not included therefore compiled code is more secure than interpreted codeTends to produce faster code than interpreting source codeProduces an executable file and therefore the program can be run without need of the source code
Disadvantages of using a compiler
Object code needs to be produced before a final executable file this can be a slow processThe source code must be 100 correct for the executable file to be produced
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 4
SP NOTES
Interpreter
An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone
Advantages of using an Interpreter
Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same
codeUseful for prototyping software and testing basic program logic
Disadvantages of using an Interpreter
Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method
1)Name the three types of program translator and describe what level of language each translates
Answer
Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level
2) Why might you want to use a compiler instead of an interpreter
A compiler makes faster more secure code
3) Why might you want to use an interpreter instead of a compiler
Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code
Loader
A Loader is system program that place the object program into main memory and prepares it for execution
bull Basic functions of loader
ndash Allocation
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5
SP NOTES
ndash Linking
ndash Relocation
ndash Loading
Types of Loader-
bull Compile-and-go Loader
bull Relocating Loader
bull Direct Linking Loader
bull Absolute Loader
bull General Loader
bull Dynamic Loader
Macro amp Macro processor
Macro ndash Macro is a single line abbreviation for a group of instruction
MACRO -------- Start of definition
INCR -------- Macro name
A 1DATA
A 2DATA Sequence of instructions to be abbreviated
A 3DATA
MEND -------- End of definition
Linking and Linker
bull Linking ndash The Process of merging many object modules to form a single object program is called as linking
bull Linker bull The Linker is the software program which binds many object modules to make a single object program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6
SP NOTES
Lecture_no4 Foundation of system programming
1) Formal Systembull A formal system is an un interpreted calculus
It consists of
ndash Alphabets
ndash A set of words called Axioms
ndash Finite set of relations called rules of inference or production rules
ndash Ex Boolean algebra
2) Need Of System Software-The basic need of system software is to achieve the following goals -
bull To achieve efficient performance of the system
bull To make effective execution of general user program
bull To make effective utilization of human resources
bull To make available new better facilities
Need of system programming
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7
SP NOTES
Lecture_no5 Evolution of operating systemGeneral machine structure
OSJobmultiprogrammingMultitasking
FragmentationPagingScheduling
Facilities of computer os
Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware
bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner
Functions of Operating System
bull File handling and management
bull Storage management (Memory management)
bull Device scheduling and management
bull CPU scheduling
bull Information management
bull Process control (management)
bull Error handling
bull Protecting itself from user amp protecting user from other users
What is MFT
What is MVT
What is Fragmentation
What is paging Explain its various types
What is scheduling
What is meant by the term time sharing
What is virtual memoryWhat are simple batched systems Explain
Give an evolution of an operating system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8
SP NOTES
Lecture_no6 Description of Von-Neumann machines components
General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine
1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components
(Design of the VonNeumann architecture)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no3 Evolution of components of system programming
Assembler
An assembler translates assembly language into machine code Assembly language consists of mnemonics for machine opcodes so assemblers perform a 11 translation from mnemonic to a direct instruction For example
LDA 4 converts to 0001001000100100
Conversely one instruction in a high level language will translate to one or more instructions at machine level
Advantages of using an Assembler
Very fast in translating assembly language to machine code as 1 to 1 relationshipAssembly code is often very efficient (and therefore fast) because it is a low level languageAssembly code is fairly easy to understand due to the use of English-like mnemonics
Disadvantages of using Assembler
Assembly language is written for a certain instruction set andor processorAssembly tends to be optimised for the hardware its designed for meaning it is often incompatible with
different hardwareLots of assembly code is needed to do relatively simple tasks and complex programs require lots of
programming time
Compiler
A Compiler is a computer program that translates code written in a high level language to a lower level language objectmachine code The most common reason for translating source code is to create an executable program (converting from a high level language into machine language)
Advantages of using a compiler
Source code is not included therefore compiled code is more secure than interpreted codeTends to produce faster code than interpreting source codeProduces an executable file and therefore the program can be run without need of the source code
Disadvantages of using a compiler
Object code needs to be produced before a final executable file this can be a slow processThe source code must be 100 correct for the executable file to be produced
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 4
SP NOTES
Interpreter
An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone
Advantages of using an Interpreter
Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same
codeUseful for prototyping software and testing basic program logic
Disadvantages of using an Interpreter
Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method
1)Name the three types of program translator and describe what level of language each translates
Answer
Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level
2) Why might you want to use a compiler instead of an interpreter
A compiler makes faster more secure code
3) Why might you want to use an interpreter instead of a compiler
Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code
Loader
A Loader is system program that place the object program into main memory and prepares it for execution
bull Basic functions of loader
ndash Allocation
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5
SP NOTES
ndash Linking
ndash Relocation
ndash Loading
Types of Loader-
bull Compile-and-go Loader
bull Relocating Loader
bull Direct Linking Loader
bull Absolute Loader
bull General Loader
bull Dynamic Loader
Macro amp Macro processor
Macro ndash Macro is a single line abbreviation for a group of instruction
MACRO -------- Start of definition
INCR -------- Macro name
A 1DATA
A 2DATA Sequence of instructions to be abbreviated
A 3DATA
MEND -------- End of definition
Linking and Linker
bull Linking ndash The Process of merging many object modules to form a single object program is called as linking
bull Linker bull The Linker is the software program which binds many object modules to make a single object program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6
SP NOTES
Lecture_no4 Foundation of system programming
1) Formal Systembull A formal system is an un interpreted calculus
It consists of
ndash Alphabets
ndash A set of words called Axioms
ndash Finite set of relations called rules of inference or production rules
ndash Ex Boolean algebra
2) Need Of System Software-The basic need of system software is to achieve the following goals -
bull To achieve efficient performance of the system
bull To make effective execution of general user program
bull To make effective utilization of human resources
bull To make available new better facilities
Need of system programming
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7
SP NOTES
Lecture_no5 Evolution of operating systemGeneral machine structure
OSJobmultiprogrammingMultitasking
FragmentationPagingScheduling
Facilities of computer os
Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware
bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner
Functions of Operating System
bull File handling and management
bull Storage management (Memory management)
bull Device scheduling and management
bull CPU scheduling
bull Information management
bull Process control (management)
bull Error handling
bull Protecting itself from user amp protecting user from other users
What is MFT
What is MVT
What is Fragmentation
What is paging Explain its various types
What is scheduling
What is meant by the term time sharing
What is virtual memoryWhat are simple batched systems Explain
Give an evolution of an operating system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8
SP NOTES
Lecture_no6 Description of Von-Neumann machines components
General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine
1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components
(Design of the VonNeumann architecture)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Interpreter
An interpreter program executes other programs directly running through program code and executing it line-by-line As it analyses every line an interpreter is slower than running compiled code but it can take less time to interpret program code than to compile and then run it mdash this is very useful when prototyping and testing code Interpreters are written for multiple platforms this means code written once can be run immediately on different systems without having to recompile for each Examples of this include flash based web programs that will run on your PC MAC games console and Mobile phone
Advantages of using an Interpreter
Easier to debug(check errors) than a compilerEasier to create multi-platform code as each different platform would have an interpreter to run the same
codeUseful for prototyping software and testing basic program logic
Disadvantages of using an Interpreter
Source code is required for the program to be executed and this source code can be read making it insecureInterpreters are generally slower than compiled programs due to the per-line translation method
1)Name the three types of program translator and describe what level of language each translates
Answer
Assembler - Low Level (Assembly) Compiler - High Level Interpreter - High Level
2) Why might you want to use a compiler instead of an interpreter
A compiler makes faster more secure code
3) Why might you want to use an interpreter instead of a compiler
Interpreters allow for code to run on multiple platforms it allows you to debug and test code without having to re-compile the entire source code
Loader
A Loader is system program that place the object program into main memory and prepares it for execution
bull Basic functions of loader
ndash Allocation
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 5
SP NOTES
ndash Linking
ndash Relocation
ndash Loading
Types of Loader-
bull Compile-and-go Loader
bull Relocating Loader
bull Direct Linking Loader
bull Absolute Loader
bull General Loader
bull Dynamic Loader
Macro amp Macro processor
Macro ndash Macro is a single line abbreviation for a group of instruction
MACRO -------- Start of definition
INCR -------- Macro name
A 1DATA
A 2DATA Sequence of instructions to be abbreviated
A 3DATA
MEND -------- End of definition
Linking and Linker
bull Linking ndash The Process of merging many object modules to form a single object program is called as linking
bull Linker bull The Linker is the software program which binds many object modules to make a single object program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6
SP NOTES
Lecture_no4 Foundation of system programming
1) Formal Systembull A formal system is an un interpreted calculus
It consists of
ndash Alphabets
ndash A set of words called Axioms
ndash Finite set of relations called rules of inference or production rules
ndash Ex Boolean algebra
2) Need Of System Software-The basic need of system software is to achieve the following goals -
bull To achieve efficient performance of the system
bull To make effective execution of general user program
bull To make effective utilization of human resources
bull To make available new better facilities
Need of system programming
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7
SP NOTES
Lecture_no5 Evolution of operating systemGeneral machine structure
OSJobmultiprogrammingMultitasking
FragmentationPagingScheduling
Facilities of computer os
Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware
bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner
Functions of Operating System
bull File handling and management
bull Storage management (Memory management)
bull Device scheduling and management
bull CPU scheduling
bull Information management
bull Process control (management)
bull Error handling
bull Protecting itself from user amp protecting user from other users
What is MFT
What is MVT
What is Fragmentation
What is paging Explain its various types
What is scheduling
What is meant by the term time sharing
What is virtual memoryWhat are simple batched systems Explain
Give an evolution of an operating system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8
SP NOTES
Lecture_no6 Description of Von-Neumann machines components
General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine
1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components
(Design of the VonNeumann architecture)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
ndash Linking
ndash Relocation
ndash Loading
Types of Loader-
bull Compile-and-go Loader
bull Relocating Loader
bull Direct Linking Loader
bull Absolute Loader
bull General Loader
bull Dynamic Loader
Macro amp Macro processor
Macro ndash Macro is a single line abbreviation for a group of instruction
MACRO -------- Start of definition
INCR -------- Macro name
A 1DATA
A 2DATA Sequence of instructions to be abbreviated
A 3DATA
MEND -------- End of definition
Linking and Linker
bull Linking ndash The Process of merging many object modules to form a single object program is called as linking
bull Linker bull The Linker is the software program which binds many object modules to make a single object program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 6
SP NOTES
Lecture_no4 Foundation of system programming
1) Formal Systembull A formal system is an un interpreted calculus
It consists of
ndash Alphabets
ndash A set of words called Axioms
ndash Finite set of relations called rules of inference or production rules
ndash Ex Boolean algebra
2) Need Of System Software-The basic need of system software is to achieve the following goals -
bull To achieve efficient performance of the system
bull To make effective execution of general user program
bull To make effective utilization of human resources
bull To make available new better facilities
Need of system programming
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7
SP NOTES
Lecture_no5 Evolution of operating systemGeneral machine structure
OSJobmultiprogrammingMultitasking
FragmentationPagingScheduling
Facilities of computer os
Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware
bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner
Functions of Operating System
bull File handling and management
bull Storage management (Memory management)
bull Device scheduling and management
bull CPU scheduling
bull Information management
bull Process control (management)
bull Error handling
bull Protecting itself from user amp protecting user from other users
What is MFT
What is MVT
What is Fragmentation
What is paging Explain its various types
What is scheduling
What is meant by the term time sharing
What is virtual memoryWhat are simple batched systems Explain
Give an evolution of an operating system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8
SP NOTES
Lecture_no6 Description of Von-Neumann machines components
General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine
1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components
(Design of the VonNeumann architecture)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no4 Foundation of system programming
1) Formal Systembull A formal system is an un interpreted calculus
It consists of
ndash Alphabets
ndash A set of words called Axioms
ndash Finite set of relations called rules of inference or production rules
ndash Ex Boolean algebra
2) Need Of System Software-The basic need of system software is to achieve the following goals -
bull To achieve efficient performance of the system
bull To make effective execution of general user program
bull To make effective utilization of human resources
bull To make available new better facilities
Need of system programming
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 7
SP NOTES
Lecture_no5 Evolution of operating systemGeneral machine structure
OSJobmultiprogrammingMultitasking
FragmentationPagingScheduling
Facilities of computer os
Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware
bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner
Functions of Operating System
bull File handling and management
bull Storage management (Memory management)
bull Device scheduling and management
bull CPU scheduling
bull Information management
bull Process control (management)
bull Error handling
bull Protecting itself from user amp protecting user from other users
What is MFT
What is MVT
What is Fragmentation
What is paging Explain its various types
What is scheduling
What is meant by the term time sharing
What is virtual memoryWhat are simple batched systems Explain
Give an evolution of an operating system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8
SP NOTES
Lecture_no6 Description of Von-Neumann machines components
General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine
1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components
(Design of the VonNeumann architecture)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no5 Evolution of operating systemGeneral machine structure
OSJobmultiprogrammingMultitasking
FragmentationPagingScheduling
Facilities of computer os
Operating Systembull It is the collection of system programs which acts as an interface between user and the computer and computer hardware
bull The purpose of an operating system is to provide an environment in which A user can execute programs in a convenient manner
Functions of Operating System
bull File handling and management
bull Storage management (Memory management)
bull Device scheduling and management
bull CPU scheduling
bull Information management
bull Process control (management)
bull Error handling
bull Protecting itself from user amp protecting user from other users
What is MFT
What is MVT
What is Fragmentation
What is paging Explain its various types
What is scheduling
What is meant by the term time sharing
What is virtual memoryWhat are simple batched systems Explain
Give an evolution of an operating system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 8
SP NOTES
Lecture_no6 Description of Von-Neumann machines components
General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine
1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components
(Design of the VonNeumann architecture)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no6 Description of Von-Neumann machines components
General machine structureVon-Neumannrsquos componentsBlock diagram and functions of each component of Von-neumannrsquos machine
1 The von Neumann Computer Modelbull Von Neumann computer systems contain three main building blocks-gt the central processing unit (CPU)-gt memory-gt and inputoutput devices (IO)bull These three components are connected together using the system busbull The most prominent items within the CPU are the registers they can be manipulated directly by acomputer program The following block diagram shows major relationship between CPU components
(Design of the VonNeumann architecture)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 9
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
2 Components of the Von Neumann Model-gt Memory Storage of information (dataprogram)-gt Processing Unit ComputationProcessing of Information-gt Input Means of getting information into the computer eg keyboard mouse-gtOutput Means of getting information out of the computer eg printer monitor bull Control Unit Makes sure that all the other parts perform their tasks correctly and at the correct time The von Neumann Machine
(connection between CPU amp Main memory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 10
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
MBR (Memory Buffer Register) = MBR contains a word to be stored in memory or is sued to receive from memory
MAR (Memory Address Register) = MAR specifies the address in memory of the word to be written from or read into the MBR
IR (Instruction Register) = IR contains the 8-bit op-code instruction being executedIBR (Instruction Buffer Register) = IBR is employed to hold temporarily the right-hand instructions
from a word in memoryPC (Program Counter) = PC contains the address of the next instruction-pair to be fetched from
memoryAC and MQ (Accumulator and Multiplier Quotient) = AC and MQ are employed to hold temporarily
operands and results of ALU operations
Fetch The process of retrieving a value from a memory location in order to put it in a register or pass it to a processing unit
IO controller A special-purpose processor that mediates between the main computer processor and a specific input or output device The processor records the request made by the main processor leaving it free to continue other tasks and interrupts the main processor when the input or output task is complete
Input Output The subsystem responsible for managing communications with external devices whether external memory storage other processors or devices for human interaction
Instruction set The set of codes that describe all the legal operations a particular computer can execute
Op code An unsigned binary integer that is assigned to a specific task the hardware can perform
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 11
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Register A special kind of very fast memory which is referred to by name rather than by a memory address number Registers often hold data values that are currently in use by the ALU or other subsystems of the architecture
Random access memory (RAM) The typical computer memory where each cell is addressed individually and may be updated or accessed in time that is independent of the cellrsquos location
Read-only memory (ROM) A memory that is usually very similar to RAM except it may only be accessed not updated
Store The process of writing a value to a memory cell
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 12
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no7 Instruction format with microflowchart for ADD instruction
Flowchart for instruction fetchamp execution of general machine structure
Typical operating system Information flow in microflowchart for ADDSUBMULT
Memory operations
FETCH (addr)
1Put addr into MAR2Tell memory unit to ldquoloadrdquo3Memory copies data into MDR
STORE(addrnew-value)
1 Put addr into MAR2 Put new-value into MDR3 Tell memory unit to ldquostorerdquo4 Memory stores data from MDR into memory cell
Problem 1)
a) Write an assembly program for a GPR machine having 8 general purpose registers (R1R7) to evaluate the following expression Ignore the remainder part of the division process
( (a + b) (a + d) ) (a e)
where abde are variables with addresses ABDE respectively and each operation is defined as follows + --------gt ADD Ri Rj ( Rj lt- Ri + Rj) --------gt MULT Ri Rj ( Rj lt- Ri Rj) --------gt DIV Ri Rj ( Rj lt- Rj Ri)
a = M[A] ------gt content of memory location at address A same for B D E
move A D0 D0 = amove B D4 D4 = badd D0 D4 D4 = a + bmove D D1 D1 = dadd D0 D1 D1 = a + ddiv D1D4 D4 = (a + b) (a + d)move E D2 D2 = emult D0 D2 D2 = (a e)mult D4 D2 D2 = ( (a + b) (a + d) ) (a e)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 13
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Problem 2)
a) Provide micro-instructions in the form of detailed flowchart for the execution of the following instructions
b) Show the flow of the micro-instructions within the GPR architecture by highlighting the system buses and the registers used
i)move X D0 move contents of memory location X to register D0ii)move D0 X move contents of register D0 to memory location Xiii)movea 22(a3)25(a1) move contents from memory to memoryiv)bgtw label branch if greater than to location ldquolabelrdquo
Answer 2)
For each of the above for part b) draw the CPU consisting of all the registers (for example MAR MBR PC IR Address and Data Registers ALU) and Memory and show each of the micro-instruction as in part a)
i) move X D0Machine Instruction
0011 0000 0011 1000
Address of X
a) Micro-steps are
MAR lt-- PC
MBR lt-- M[MAR] fetch
IR lt-- MBR
| decode |PC lt-- PC + 2 PC points to source addressMAR lt-- PCMBR lt-- M[MAR] address of X in MBRMAR MBR move X to MARMBR M[MAR] read contents of location XD0 MBR PC lt-- PC + 2 PC points to next instruction
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 14
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
ii) move D0 X
Machine Instruction
0011 1110 0000 0000
Address of X
a) Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to destination addressMAR lt-- PCMBR M[MAR] read destination address XMAR MBR MAR has X now
MBR D0 source data in MBR
M[MAR] MBR data in Effective address X (which is in MAR)
PC lt-- PC + 2 PC points to next instruction
iii) movea 22(a3)25(a1)
Machine Instruction
0011 0011 0110 1011
0000 0000 0001 0110
0000 0000 0001 1001
a)Source Effective address = contents of a3 + $16temp lt--- M[a3 + $16]
Destination Effective address = contents of a1 + $19M[a1 + $19] lt- temp
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 15
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Micro-steps areMAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | decode |PC lt-- PC + 2 PC points to source displacementMAR lt-- PCMBR lt-- M[MAR] source displacement loadedMAR lt-- [A3 + MBR] source effective address calculatedMBR lt- M[MAR] source data in MBRWR lt- MBR saved temporarily since MBR is to be reusedPC lt-- PC + 2 PC points to destination displacementMAR lt-- PCMBR lt-- M[MAR] destination displacement loaded
MAR lt-- [A1 + MBR] destination effective address calculatedMBR lt- WR source data back to MBR
M[MAR] lt-- MBR source data moved to destination
PC lt-- PC + 2 PC points to next instruction
iv) bgtw label
Machine Instruction
0110 1110 0000 0000 ---- displacement ----
a)MAR lt-- PCMBR lt-- M[MAR] fetchIR lt-- MBR | interpret decode |
TRUE FALSE | |
PC lt-- PC + 2 PC lt- PC + 4 MAR lt-- PC MBR lt-- M[MAR] get displacement PC lt-- PC + MBR execute
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 16
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no8 Memory unit Register Data of IBM 360
General machine structure of IBM 360370
The IBM System360 (S360) was a mainframe computer system family announced by IBM on April 7 1964 and delivered between 1965 and 1978 It was the first family of computers designed to cover the complete range of applications from small to large both commercial and scientific
(i)Memory unit
Memory (storage) in System360 is addressed in terms of 8-bit bytes Various instructions operate on larger units called halfword (2 bytes) fullword (4 bytes) doubleword (8 bytes) quad word (16 bytes) and 2048 byte storage block specifying the leftmost (lowest address) of the unit Within a halfword fullword doubleword or quadword low numbered bytes are more significant than high numbered bytes this is sometimes referred to as big-endian Many uses for these units require aligning them on the corresponding boundaries Within this article the unqualified term word refers to a fullword
The architecture of System360 provided for up to 224 = 16777216 bytes of memory however the Model 67 extended the architecture and allowed 232 = 4294967296 bytes of virtual memory
(ii)Register-
Registers are areas of storage that are set aside in the processing unit
There are 16 registers in the 370 system (General purpose registerFloating-point registerControl amp status registerdata register address register)
Four fields of information-1) the name field- optional field and must begin with a character from the alphabet can consist of 8 characters2) opcode ndash mandatory field and must be followed by a blank3) operands ndash usually represent data that could be in main storage registers or as part of the instruction itself4) comments ndash used to clarify instructions
(iii)Data
Characters are stored as 8-bit bytes Integers are stored as twos complement binary halfword or fullword values Packed decimal numbers are stored as 1-16 8-bit bytes containing an odd number of decimal digits
followed by a 4-bit sign Sign values of hexadecimal A C E and F are positive and sign values of hexadecimal B and D are negative Digit values of hexadecimal A-F and sign values of 0-9 are invalid but the PACK and UNPK instructions do not test for validity
Zoned decimal numbers are stored as 1-16 8-bit bytes each containing a zone in bits 0-3 and a digit in bits 4-7 The zone of the rightmost byte is interpreted as a sign
Floating point numbers are only stored as fullword or doubleword values on older models On the 36085and 360195there are also extended precision floating point numbers stored as quadwords For all
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 17
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
three formats bit 0 is a sign and bits 0-7 are a characteristic (exponent biased by 64) Bits 8-31 (8-63) are a hexadecimal fraction For extended precision the low order doubleword has its own sign and characteristic which are ignored on input and generated on output
Lecture_no9 Various instruction format of IBM 360370
( iv)Instruction
Instructions in the S360 are two four or six bytes in length with the opcode in byte 0 Instructions have one of the following formats
RR (two bytes) R tells that there is data residing in a register So in this type there is data in register R
Generally byte 1 specifies two 4-bit register numbers but in some cases eg SVC byte 1 is a single 8-bit immediate field
RS (four bytes) This instruction can have 2 or 3 operands It depends how many the opcode indicates If it is 2 the first is a register and the second storage If it is 3 then the first and second are registers and the third is a storage location
Byte 1 specifies two register numbers bytes 2-3 specify a base and displacement
RX (four bytes) The X also refers to a storage address however this specifies actual storage addresses in 3 pieces
Byte 1 bits 0-3 specifies either a register number or a modifier byte 1 bits 4-7 specifies the number of the general register to be used as an index bytes 2-3 specify a base and displacement
SI (four bytes) I tells that the operand consists of immediate data (meaning that the data is in the instruction itself) So in this type the first is a storage location and the second is immediate data
Byte 1 specifies an immediate field bytes 2-3 specify a base and displacement
SS (six bytes) S tells that that there is data in storage Here both operands have data in storage
Byte 1 specifies two 4-bit length fields or one 8-bit length field bytes 2-3 and 4-5 each specify a base and displacement The encoding of the length fields is length-1
Instructions must be on a two-byte boundary in memory hence the low-order bit of the instruction address is always 0
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 18
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Addressing Mode
1Implied mode Operand is in a register implied by the OpCode of the instruction ExampleADD 312 Immediate mode Operand is a constant value specified in the instruction itself ExampleADD R 103 Register mode Operand is in register specified in the instruction itself ExampleADD S D
4 Register indirect mode Operand are is in a memory address that is content of a register specifiedby the instruction itself ExampleADD (D) 35 Direct mode Absolute address of operand is explicitly specified in instruction ExampleADD 1234 S6 Indirect mode Absolute address of operand is content of a memory address ExampleADD [1234] 107 Relative mode Content of PC + OpAddr ExampleADD D $S8 Indexed mode Content of an index register + OpAddr ExampleADD D 500(S)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 19
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no10 Machine language Long way no looping method
Elementary Instruction Set
Typical Data Transfer Instructions
Typical Arithmetic Instructions
Typical Logical and Bit Manipulation Instr
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 20
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Example Write a program that compare two positive numbers x and y Output is -1 if x lt y 0if x = y +1 if x gt y
MOVE R1 x MOVE R2 y MOVE R3 0 SUB R1 R2 BN Less BZ Zero MOVE R3 +1Less MOVE R3 -1Zero MOVE R3 0 HLT
Lecture_no11 Address modification using instruction as data ampusing index register
Lecture_no12 Looping method with example
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 21
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no13 Assembly language machine language
Assembly LanguageAssembly instructions are just shorthand for machine instructions
Machine language Equivalent assembly1000000100100101 LOAD R1 51000000101000101 LOAD R2 51010000100000110 ADD R0 R1 R21000001000000110 SAVE R0 61111111111111111 HALT (For all assembly instructions that compute a result the first argument is the destination)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 22
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Very easy to write an Assembly Language -gtMachine language translator
Exercise
What would be the assembly instructions to swap the contents of registers 1 amp 2Exercise Solution
STORE 1 R1MOVE R1 R2LOAD R2 1HALT
Machine Language
The machine language for a particular computer is tied to the architecture of the CPU For example G4 Macs have a different machine language than Intel PCs We will look at the machine language of a simple simulated computer
Machine Language ExampleOur computer has 4 registers and 32 memory locations Each instruction is 16 bits Here is a machine language program for our simulated computer
10000001001001011000000101000101101000010000011010000010000001101111111111111111
LOAD contents of memorylocation into register
100000010 RR MMMMMex R0 = Mem[3]100000010 00 00011
STORE contents of registerinto memory location
100000100 RR MMMMMex Mem[4] = R0100000100 00 00100
MOVE contents of oneregister into another register
100100010000 RR RRex R0 = R1100100010000 00 01
ADD contents of 2 registersstore result in third
1010000100 RR RR RRex R0 = R1 + R21010000100 00 01 10
SUBTRACT contents of 2registers store result into third
1010001000 RR RR RRex R0 = R1 ndash R2
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 23
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
1010001000 00 01 10
Halt the program 1111111111111111
Lecture_no14 Meaning of Assembly language program types Literals
bullUSING
bullBALR
bullSTART
bullEND
bullBR
bullLiteral
bullLTORG
bullBCT
bullEQU ExampleSTART EQU 10
The directive EQU stands for equals it allows you to give a numerical value to a label In this example the assembler would always replace START with 1016 Hence you could write
START EQU 10ORG STARTGOTO 10
bullORG
The directive ORG does not get translated into any code what it does is to set the address at which the next instruction will be stored
Example
ORG 0GOTO 10
The instruction GOTO 10 is translated into the machine code 10 1000 0001 0000
But it will be placed at the address 0 ORG 0 does not get translated into any code but it affects where the code for GOTO 10 is placed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 24
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
bullpseudo-ops vs machine-ops
Lecture_no15 Example on Assembly language program
Assembly language is a low-level programming language for a computer or other programmable device specific to a particular computer architecture in contrast to most high-level programming languages which are generally portable across multiple systems Assembly language is converted into executable machine code by a utility program referred to as an assembler like NASM MASM etc
LABEL OPCODE OPERANDS COMMENTS01 02 Program to multiply an integer by the constant 603 Before execution an integer must be stored in NUMBER0405 ORIG x305006 LEA R2 NUMBER07 LDW R2 R2 008 LEA R1 SIX09 LDW R1 R1 00A AND R3R30 Clear R3 It will0B contain the product0C The inner loop0D 0E AGAIN ADD R3R3R20F ADD R1R1-1 R1 keeps track of10 BRp AGAIN the iterations11 12 HALT13
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 25
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
14 NUMBER BLKW 115 SIX FILL x000616 17 END
Figure 71 An assembly language programTwo of the parts (LABEL and COMMENTS) are optionalOpcodes and OperandsTwo of the parts (OPCODE and OPERANDS) are mandatory The OPCODE is a symbolic name for the opcode
The number of operands depends on the operation being performedAGAIN ADD R3R3R2The operands to be added are obtained from register 2 and from register 3 The resultis to be placed in register 3 We represent each of the registers 0 through 7 as R0 R2helliphelliphelliphellip R7
The location whose address is to be read is given the label NUMBER The destination into which that address is to be loaded is register 2LEA R2 NUMBERA literal value must contain a symbol identifying the representation base of the number We use for decimal x for hexadecimal and b for binary
LabelsLabels are symbolic names which are used to identify memory locations that are referred to explicitly in the programThere are two reasons for explicitly referring to a memory location1 The location contains the target of a branch instruction (for example AGAIN in line 0E)2 The location contains a value that is loaded or stored (for example NUMBER line 14 and SIX line 15)The location AGAIN is specifically referenced by the branch instruction in line 10
BRp AGAIN
If the result of ADD R1R1ndash1 is positive (as evidenced by the P condition code being set) then the program branches to the location explicitly referenced as AGAIN to perform another iterationThe value stored in the memory location explicitly referenced as NUMBER is loaded into R2If a location in the program is not explicitly referenced then there is no need to give it a label
CommentsComments are messages intended only for human consumption The purpose of comments is to make the program more comprehensible to the human reader
Pseudo-ops (Assembler Directives)
Actually a more formal name for a pseudo-op is assembler directive They are called pseudo-ops because they do not refer to operations that will be performed by the program during execution Rather the pseudo-op is strictly a message to the assembler to help the assembler in the assembly processORIG
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 26
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
ORIG tells the assembler where in memory to place the program In line 05 ORIG x3050 says start with location x3050 As a result the LEA R2NUMBER instruction will be put in location x3050FILLFILL tells the assembler to set aside the next location in the program and initialize it with the value of the operand In line 15 the ninth location in the resultant program is initialized to the value x0006BLKWBLKW tells the assembler to set aside some number of sequential memory locations (ie a BLocK ofWords) in the program The actual number is the operand of the BLKW pseudo-op In line 11 the pseudo-op instructs the assembler to set aside one location in memory
Lecture_no16 surprise test
Lecture_no17 MODULE-2
AssemblerWhy learning AssemblerAssembler or other languages that is the question Why should I learn another language if I alreadylearned other programming languages The best argument while you live in France you are able to getthrough by speaking english but you will never feel at home then and life remains complicated You canget through with this but it is rather inappropriate If things need a hurry you should use the countryslanguageAssemblerTo translate mnemonic operation codes to their machine language equivalents and assign machine addresses to symbolic labels used by the programmerDisassenblerTo translate machine code to their assembly source
Cross AssemblerThe assembler may run on one machine and produce object code for another
An assembly is a program that accepts as input an assembly language program and produces its machine language equivalent along with information for the loader
(Fig Assembler)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 27
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
For example the externally defined symbols (library program) must be indicated to the loaderthe assembler does not know the address of these symbols and it is up to the loader to find theprograms containing them load them into memory and place the values of these symbols in thecalling program
(Fig Program Translation Steps)
Fundamental assembler functions1048698 Translate mnemonic operation codes their machine language equivalents1048698 Assign machine addresses to symbolic labels used by the programmer
Basic Assembler Functions
Convert mnemonic operation codes to machine language equivalents
Convert symbolic operands to machine addresses (pass 1)
Build machine instructions in the proper format
Convert data constants to internal (machine)representations
Error checking is provided
Write the object program and assembly listing files
Assembler directives
-gtThe assembler can also process assembler directivesbull Assembler directives (or pseudo-instructions) provide instructions to the assembler itself Theyare not translated into machine instructions
-gtPseudo-Instructions Not translated into machine instructions Providing information to the assembler-gtBasic assembler directives-
bull START -Specify name and staring address for the program END- Indicate the end of the source program and (optionally) specify the first executable instruction in
the program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 28
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
BYTE -Generate character or hexadecimal constant occupying as many bytes as needed to represent the constant
WORD- Generate one-word integer constant RESB- Reserve the indicated number of bytes for a data area RESW- Reserve the indicated number of words for a data area
Lecture_no18 General design procedure of 360-type Assembler
Syntax Assembly language statements
Assembly language statements are entered one statement per line Each statement follows the following format
[label] mnemonic [operands] [comment]
The fields in the square brackets are optional A basic instruction has two parts the first one is the name of the instruction (or the mnemonic) which is to be executed and the second are the operands or the parameters of the command
An assembly language statement contains the following fields -gt Label Field -it is an identifier amp optional fieldThe maximum lengnth of label differs between assemblers It can be used to define a symbolSome accepts upto 32 characters long other only 4 charactersA label when declared is suffixed by a colon and begins with valid characters(AhelliphellipZ)
Ex-
START LDAA 24 H
Here Label START is equal to the address of the instruction LDAA 24 H -gt MnemonicOpcode -This field contains a mnemonic It stands for operation code or pseudo-opie a machine code instruction it requires additional information ieoperands -gt Operand Field -it specifies either the address or the data that the opcode requires -gt Comment Field allows the programmer to document the software This field is optional and is only printed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 29
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
on the source listing for documentation purposes The comment field is separated from the operand field (or from the mnemonic field if no operand is required) by at least one white-space character
Types of Assembly language statements
It consists of 3 typeseg
(i)Imperative statement
(ii)Declarative statements
(iii)Assembler directive statements
Advantages amp disadvantages of Assembly language
Lecture_no19 Design Specification of 1-pass assembler with example
Following steps are used for design specification of an assembler-1)Identify the information necessary to perform a task2)Specify the data structure and define format of data structure3)Determine the processing necessary to obtain and maintain the information4)Determine the processing necessary to perform a task
Assembler Design can be done in
ndashSingle pass
ndashTwo pass
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 30
Source program
Pass1
OPTAB SYMTAB
Intermediate file
Pass2
SYMTAB
Object codes
OPTAB
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
The Intermediate file include each source statement assigned address and error indicator
Pass I
Separate the symbols mnemonic op-code and operational fields Determine the storage requirement for every assembly language statement and up date the location
counter Build the symbol table (Table that is used to store each label and its corresponding value)
Pass II
Generate object code(perform actual translation)
One-Pass Assembler
The translation performed by an assembler is essentially a collection of substitutions machine operation code
for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
bullSingle Pass AssemblerndashDoes everything in single passndashCannot resolve the forward referencing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 31
LOCCTR
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
(detailed pass1 assembler flowchart)
Internal data structures1048698 the Operation Code Table (OPTAB) bull OPTAB is used to look up mnemonic operation codes and translate them to their machine language equivalents1048698 the Symbol Table (SYMTAB) bull SYMTAB is used to store values (addresses) assigned to labels1048698 Location Counter (LOCCTR) bull This is a variable that is used to help in the assignment of address LOCCTR not LOCCTR + (instruction length)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 32
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no20 Multi-pass assembler with example
Pass 1 Assign addresses to all statements in the program Save the values assigned to all labels for use in Pass 2 Perform some processing of assembler directivesPass 2 Assemble instructions Generate data values defined by BYTE WORD Perform processing of assembler directives not done in Pass 1 Write the object program and the assembly listing
(Detailed pass2 assembler flowchart)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 33
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no21 Explanation of flowchart for pass-1 amp 2-pass assembler
The difference between one pass and two pass assemblers are-A one pass assembler passes over the source file exactly once in the same pass collecting the labels resolving future references and doing the actual assembly The difficult part is to resolve future label references and assemble code in one pass
A two pass assembler does two passes over the source file ( the second pass can be over a file generated in the first pass ) In the first pass all it does is looks for label definitions and introduces them in the symbol table In the second pass after the symbol table is complete it does the actual assembly by translating the operations and so on
Assembler Design Machine Dependent Assembler Features instruction formats and addressing modes program relocation
Machine Independent Assembler Features literals symbol-defining statements expressions program blocks control sections and program linking
Object Program
Header Col 1 H Col 2~7 Program name Col 8~13 Starting address (hex) Col 14-19 Length of object program in bytes (hex) Text Col1 T Col2~7 Starting address in this record (hex) Col 8~9 Length of object code in this record in bytes (hex) Col 10~69Object code (69-10+1)6=10 instructions End Col1 E Col 2~7 Address of 16 Col2 first executable instruction (hex) (END program_name)
Absolute ProgramThe program must be loaded at the specified (absolute) address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 34
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Relocatable Program
The actual starting address is not known until load time The program can be loaded at different address of memory
How to Solvea Modification recordb Base relativec Program counter relative
Modification record Col 1 M Col 2-7 Starting location of the address field to be modified relative to the beginning of the program Col 8-9 length of the address field to be modified in halfbytes
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 35
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no22 Macro language amp Macro processor
MACRO PROCESSOR
The assembly language programmer often finds it necessary to repeat some statements or block of code several times in a program The block may consist of code to swap sets of registers do some arithmetic operations In this situation the programmer find a macro instruction facility useful Macro instruction (often called macros) are single line abbreviation for group of instructions In employing a macro the programmer essentially defines a single instruction to represent a block of code For every occurrence of this one-line macro instruction in his program the macro processing assembler substitute the entire block
A macro represents a commonly used group of statements in the source pro ramming language The macro processor replaces each macro instruction with the corresponding group of source language statements This is called expanding of macros
Obj
Basic Macro Processor Functions
Macro Definition and Usage
In this section we will present one more example to highlight salient aspects of macro-processor The example is very similar to Intels 8 bit microprocessor assembly language instruction
ExampleMACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 36
Source code (with macro)
Macro Processor
Expanded code
Compiler Assembler
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
A macro definition is placed at the start of a program enclosed between the statements MACRO and ENDM A MACRO statement indicates that a macro definition starts while statement ENDM indicates the end of a macro definition Thus a group of statements start ing with MACRO and ending with ENDM constitutes one macro definition unit If many macros are to be defined in a program as many definition
modules will exist at the start of the program Each definition module contains a new operation and defines it to consist of a sequence of assembly language statement In example above INCRMT is defined to be the name of the LOAD-ADD-STORE instruction sequence
The operation defined by a macro can be used by writing the macro name in the mnemonic field and its operands in the operand field of an assembly statement Appearance of a macro name in the mnemonic field amounts to a call on the macro The assembler replaces such a statement by the statement sequence comprising the macro This is known as macro expansion
INCRMT XY
is shown to lead to insertion of the assembly statements
LOAD X
ADD Y
STORE X
in its place All macro calls in a program are expanded in this fashion
Defining a Macro
Let us take another look at the macro definition unit
MACROINCRMT ampAampBLOAD ampA MacroADD ampB definitionSTORE ampAENDM
INCRMT XY LOAD X macroADD Y expansionSTORE X
ENDM Macro Program
The macro header statement indicates the existence of a macro definition unit Absence of the header statement as the first statement of a program or ft first statement following a macro definition unit signals the start of the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 37
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
main assembly language program The next statement in the definition unit is die prototype for a macro call This statement names the macro and indicates how the operands in any call on the macro would be written
The prototype is followed by the so called model statements These are assembly statements which will replace the macro call as a result of macro expansion
Basic Macro Processor Functions
-gtDirectives used during usage of Macro
Macro Indicates begin of Macro MEND indicates end of Macro
-gtPrototype for Macro
Each argument starts with Name and macro Parameter list
MEND
-gtBODY The statement will be generated as the expansion of Macro
Macro Expansion
Each macro invocation statement will be expanded into the statements that form the body of the macro
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 38
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Arguments from the macro invocation are substituted for the parameters in the macro prototype (according to their positions)
In the definition of macro parameter In the macro invocation argument
Lecture_no23 Machine-Independent Macro Processor Features
MACRO INVOCATION
A macro invocation statement (a macro call) gives the name of the macro instruction being invoked and the arguments to be used in expanding the macro
macro_name p1 p2 hellip
Difference between macro call and procedure call
Macro call statements of the macro body are expanded each time the macro is invoked
Procedure call statements of the subroutine appear only one regardless of how many times the subroutine is called
Macro parameters
i) Positional parameter
The prototype statement indicates how operands in a macro call would be written These operands are called parameters or arguments All parameters used in the prototype statement have names starting with the special character amp These parameters are known as formal parameters A macro call is written using parameter names which do not start with ft special character amp These are known as actual parameters
The lists of formal and actual parameters also called as formal and actual parameter lists specified in the prototype and macro call statements respectively establish a correspondence between each formal parameter and an actual parameter In figure 12 this correspondence is determined by the relative positions of these parameters in their respective lists Thus the first actual parameter in the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 39
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
fist is paired with the first of formal parameters etc Considering the prototype and macro call statements once again
INCRMT ampA ampB prototype
INCRMT XY macro call
We see that X would be paired with ampA and Y with ampB While expanding a macro call any formal parameter appearing within a model statement is replaced by the corresponding actual parameter This is how expansion of the call INCR XY heads to the following statements
LOAD X
ADD Y
STORE X
Example
STRG DATA1 DATA2 DATA3
GENER DIRECT3
(ii) Keyword parameter
Using a different form of parameter specification called keyword parameters each argument value is written with a keyword that names the corresponding parameter
ExampleSTRG ampa3=DATA1 ampa2=DATA2 ampa1=DATA3
GENER TYPE=DIRECT CHANNEL=3
Features of macro facility
(1)Macro instruction argument
(2)Macro calls within macro(nested macro calls)
(3)Macro instruction defining(macros within macro)
(4)Conditional macro ExpansionLecture_no24 Design of 1-pass macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 40
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Single passraquo every macro must be defined before it is calledraquo one-pass processor can alternate between macro definition and macro expansionraquo nested macro definitions may be allowed but nested calls are not
Step 1
Scan all macro definitions one by one Foi each macro defined
i) enter its name in the Macro Name Table (MNT)
ii) store the entire macro definition in the Macro Definition Table (MDT)
iii) add auxiliary information to the MNT indicating where the definition of a macro can be
found in MDT
Step 2
Examine all statements in the assembly source program to detect macro calls For each macro call
i) locate the macro in MNT
ii) obtain information from MNT regarding position of the macro definition in MDT
iii) process the macro call statement to establish correspondence between all formal
parameters and their values (ie actual parameters)
iv) expand the macro call by following the procedure given in step 3
Step 3
Process the statements in the macro definition as found in MDT in their expansion time order until the ENDM statement is encountered The conditional assembly statement AIF and AGO will enforce changes in the normal sequential order based on certain expansion time relations between values of formal parameters and expansion time variables
Lecture_no25 Design of pass-2 macro processor
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 41
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Macro vs SubroutineMacro- the statement of expansion are generated each time the macro are invoked
Subroutine- the statement in a subroutine appears only once
Differences between Macro and Subroutine After macro processing the expanded file can be used as input to the assembler The statements generated from the macro expansions will be assembled exactly asthough they had been written directly by the programmer
The differences between macro invocation and subroutine callo The statements that form the body of the macro are generated each time amacro is expandedo Statements in a subroutine appear only once regardless of how many times thesubroutine is called
One-Pass Macro Processor -gt Prerequisite every macro must be defined before it is called -gtSub-procedures macro definition DEFINE macro invocation EXPAND
What are the data structures used by the macro processor Data Structures o MACRO DEFINITION TABLE (DEFTAB) Macro prototype States that make up the macro body Reference to parameters are converted to a positional notation
o MACRO NAME TABLE (NAMTAB) Role an index to DEFTAB Pointers to the beginning and end of the definition in DEFTAB
o MACRO ARGUMENT TABLE (ARGTAB) Used during the expansion Invocation Arguments are stored in ARGTAB according to their position
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 42
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
MACRO NAMTAB
CALL DEFTAB
ARGTAB
Two pass algorithm
raquo Pass1 Recognize macro definitions
raquo Pass2 Recognize macro calls
raquo nested macro definitions are not allowed
Write an algorithm amp flowchart for the Macro processor and explain
Lecture_no26 Loader and LinkerMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 43
PROCESSLINEDEFINE
EXPAND
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
In this chapter we will understand the concept of linking and loading As discussed earlier the source program is converted to object program by assembler The loader is a program which takes this object program prepares it for execution and loads this executable code of the source into memory for execution
Definition of Loader Loader is utility program which takes object code as input prepares it for execution and loads the executable code into the memory Thus loader is actually responsible for initiating the execution processFunctions of Loader
The loader is responsible for the activities such as allocation linking relocation and loading1) It allocates the space for program in the memory by calculating the size of the program This activity is called allocation2) It resolves the symbolic references (codedata) between the object modules by assigning all the user subroutine and library subroutine addresses This activity is called linking3) There are some address dependent locations in the program such address constants must be adjusted according to allocated space such activity done by loader is called relocation4) Finally it places all the machine instructions and data of corresponding programs and subroutines into the memory Thus program now becomes ready for execution this activity is called loading
Types of Loader schemes
1)ldquocompile and gordquo loader- In this type of loader the instruction is readline by line its machine code is obtained and it is directly put in the mainmemory at some known address
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 44
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Advantagesbull This scheme is simple to implement Because assembler is placed at one part of the memory and loader simply loads assembled machine instructions into the memoryDisadvantagesbull In this scheme some portion of memory is occupied by assembler which is simply a wastage of memory As this scheme is combination of assembler and loader activities this combination program occupies large block of memory
2) General Loader Scheme In this loader scheme the source program is converted to object program by some translator (assembler)
Advantagesbull The program need not be retranslated each time while running it This is because initially when source program gets executed an object program gets generated Of program is not modified then loader can make use of this object program to convert it to executable form
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 45
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
3)Absolute Loader Absolute loader is a kind of loader in which relocated object files are created loader accepts these files and places them at specified locations in the memory This type of loader is called absolute because no relocation information is needed
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 46
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Absolute loader As the name itself says it does nothing other than loading
Advantages It is simple to implementDisadvantages
In this scheme it is the programmers duty to adjust all the inter segment addresses and manually do the linking activity For that it is necessary for a programmer to know the memory management
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 47
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no27 Machine-Dependent Loader Features ndash Relocation
Program Linking
Linking
The need for linking a program with other programs arises because a program written by a programmer or its translated version is rarely of a stand-alone nature That is a program generally cannot execute on its own without requiring the presence of some other programs in the computers memory
For example consider a program written in high level languages like C Such a program may contain calls on certain InputOutput functions like Printf ( ) Scanf ( ) etc which am not written by the programmer himself During program execution those standard functions must reside into the main memory Furthermore everytime an InputOutput function is called by a C language program control should get transferred to the appropriate function The linking function makes address of programs known to each other so that such transfers can take place during the execution
Relocation
Another function commonly performed by a loader is that of program relocation This function can be explained as follows Assume that a program written in C ( let us call it A) calls standard function Printf ( ) A and Printf ( ) would have to be linked with each other But where is main storage shall we load A and Printf ( ) A possible solution would be to load them according to the addresses assigned when they were U~Wamp For example as translated A might be given stone area from 200 to 300 while Printf ( )function occupies area from 100 to 150
If we were to load these programs at their translated addresses a lot of storage lying between them may go waste Another possibility is that both A and Printf ( ) may have been translated with the identical start address of 100 7bus A extends from 100 to 200 while Printf ( ) extends from 100 to 1 50 But there is simply no way A and Printf ( )can co-exist at same storage location Therefore the loader may have to relocate one or both of these programs to avoid address conflicts or storage waste
It should be noted that relocation is more than simply moving a program from one area to another in the storage It refers to adjustment of address fields and not to movement of a program
The task of relocation is to add some constant value to each relative address in the segment (the segment is a unit of information dust is treated as an entity be it a program or data It is possible to produce multiple program or data segment in a single source file) The pan of a loader which performs relocation is called relocating loader
Relocating Loader To avoid possible reassembling of all subroutines when a single sub-routine is changed and to perform the tasks of allocation and linking for the programmer The general class of relocating loader was introduced
The output of a relocating loader is the object program and information about all other programs it references In addition there is information (relocation information) as to location in this program that need to be changed if it is to be loaded in an arbitrary location in memory Direct Linking Loader It is a general relocatable loader and is perhaps the most popular loading scheme presently used It has the advantage of allowing the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 48
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
programmer multiple procedure segments and multiple data segments and of giving him complete freedom in referencing data or instructions contained in other segments This provides flexible inter segment referencing and accessing ability while at the same time allowing independent translations of programs
Lecture_no28 Loader Design Option
Machine independent loader feature
Bootstrap Loader As we turn on the computer there is nothing meaningful in the main memory (RAM) A small program is written and stored in the ROM This program initially loads the operating system from secondary storage to main memory The operating system then takes the overall control This program which is responsible for booting up the system is called bootstrap loader
bullLinkage editor-
Perform linking prior to load timeA linkage editorraquo Produces a linked version of program (often called a load module or an executable image) which is written to a file or library for later execution
Dynamic Linking-
Linking function is performed at execution time-gt Postpone the linking function until execution time raquo A loaded and subroutine is linked to the rest of the program when it is first called Dynamic linking dynamic loading or load on call
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 49
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
-gtAllow several executing programs to share one copy of a subroutine or library
bullLinking loader-
A linking loaders performs
raquo All linking and relocation operationsraquo Automatic library searchraquo Loads the linked program directly into memory for execution
bullOverlay
Overlays are another technique that dates back to before 1960 and are still in use in some memory-constrained environments overlay techniques can be a good way to squeeze programs
Overlaid programs divide the code into a tree of segments
A typical overlay tree ROOT calls A and D A calls B and C D calls E and F
The programmer manually assigns object files or individual object code segments to overlay segments Sibling segments in the overlay tree share the same memory In the example segments A and D share the same memory B and C share the same memory and E and F share the same memory The sequence of segments that lead to a specific segment is called a path so the path for E includes the root D and E
When the program starts the system loads the root segment which contains the entry point of the program Each time a routine makes a downward inter-segment call the overlay manager ensures that the path to the call target is loaded For example if the root calls a routine in segment A the overlay manager loads section A if its not already loaded If a routine in A calls a routine in B the manager has to ensure that B is loaded and if a routine in the root calls a routine in B the manager ensures that both A and B are loaded Upwards calls dont require any linker help since the entire path from the root is already loaded
(fig Overlay)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 50
A D
CB FE
ROOT
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
The essential difference between a linkage editor and a linking loader
Lecture_no29 Design of Direct linking loaderMrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 51
Linking loader
Memory
Library
Object program(s)Object program(s)
Library
Linked program
Linkage editor
Relocating loader
Memory
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Direct Linking LoadersThe direct linking loader is the most common type of loader This type of loader is a relocatable loader The loader can not have the direct access to the source code And to place the object code in the memory there are two situations either the address of the object code could be absolute which then can be directly placed at the specified location or the address can be relative If at all the address is relative then it is the assembler who informs the loader about the relative addresses
Advantages1 The overhead on the loader is reduced The required subroutine will be load in the main memory only at the time of execution2 The system can be dynamically reconfiguredDisadvantages The linking and loading need to be postponed until the execution During the execution if at all any subroutine is needed then the process of execution needs to be suspended until the required subroutine gets loaded in the main memory
-gtLets take a quick look of our three loaders absolute loader relocating loader and direct linking loader respectively for the above four functions
Allocation - Programmer Programmer ProgrammerLinking - Programmer Programmer LoaderRelocation - Assembler Loader AssemblerLoading - Loader Loader Loader
Subroutine Linkages-
bull The main program A wishes to transfer to subprogram B bull The programmer in program A could write a transfer instruction (e g BSR B) to subprogram B bull The assembler does not know the value of this symbol reference and will declare it as an error
Design of Direct linking loader
pass1 -gtAllocate and assign each program location in core -gtCreate a symbol table filling in the values of the external symbols
1input object deck 2Initial Program Load Address(IPLA) 3Program Load Address(PLA) counter 4 External Symbol Directory card(ESD) 5A copy of the input (TXT card)6RLD ( Relocation amp Linkage Directory)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 52
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
7END card
pass2
-gtLoad the actual program text -gtPerform the relocation modification of any address constants needing to be altered -gtResolve external references (linking)
copy of object program IPLA parameter (Initial Program Load Address) supplied by the OS or programmer to specify the
address to load the first segmentPLA counter (Program Load address ) to keep track of each segment location External Symbol Directory card(ESD) to store each external symbol and its corresponding assigned
core addressLocal External Symbol Array(LESA) to establish correspondence between ESD ID used in ESD amp
RLD cardsESD (external symbol dictionary)
Data Structures
Algorithm
Binary Symbolic Subroutine Loader- The assembler assembles Provides the loader
-gtObject program + relocation information -gtPrefixed with information about all other program it references (transfer vector) -gtThe length of the entire program -gtThe length of the transfer vector portion
Binder-bull A binder is a program that performs the same functions as the direct linking loader
ndash allocation relocation and linking bull Outputs the text in a file rather than memory
ndash called a load module
Lecture_no30 Programming Language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 53
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
The role of Programming Language
-gtNot only in designing and implementing but in teaching and supporting it -gt Language designer s balance -gtmaking computing convenient for people with -gtmaking efficient use of computing machines
High-level language(HLL)
A high-level language is an advanced computer programming language that isnt limited by the computer designed for a specific job and is easier to understand Today there are dozens of high-level languages some examples include BASIC C FORTRAN Java and Pascal
Importance of HLL
Lecture_no 31 Details of features of HLL
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 54
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Data types and data structures
Storage allocation and scope of names
Accessing flexibility
Function modularity
Asynchronous operation
Benefits of higher -level language
-Readability-Machine independent (portable)-Availability of program libraries-Easy error-checking
Lecture_no32 Table processing
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 55
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Sorting and Searching
Objectives
To be able to explain and implement sequential search and binary search To be able to explain and implement selection sort bubble sort merge sort quick sort insertion sort
and shell sort To understand the idea of hashing as a search technique To introduce the map abstract data type To implement the map abstract data type using hashing
1048698 Searching in a table Searching is the algorithmic process of finding a particular item in a collection of itemsbull Key symbol namebull Methodsbull Linear search
Linear search works on the design principle of brute force and it is one of the simplest searching algorithm as well Linear search is also called as a sequential search It compares exhaustively every entry in the tablebull Binary searchBinary Search takes a sorted array as the input It works by comparing the target (search key) with the middle element of the array and terminates if it is the same else it divides the array into two sub arrays and continues the search in left (right) sub array if the target is less (greater) than the middle element of the array
bull ordered table
bull HashA hash table is a collection of items which are stored in such a way as to make it easy to find them later Each position of the hash table often called a slot can hold an item and is named by an integer value starting at 0 The mapping between an item and the slot where that item belongs in the hash table is called the hash function
Sorting
We will discuss four comparison-sort algorithms
1 selection sort 2 insertion sort 3 merge sort 4 quick sort
Lecture_no33 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 56
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no34 MODULE-3
Formal systems and programming language
A formal system is broadly defined as any well-defined system of abstract thought based on the model of mathematics Euclids Elements is often held to be the first formal system and displays the characteristic of a formal system
Formal systems in mathematics consist of the following elements
1 A finite set of symbols (ie the alphabet) that can be used for constructing formulas (ie finite strings of symbols)
2 A grammar which tells how well-formed formulas (abbreviated wff) are constructed out of the symbols in the alphabet It is usually required that there be a decision procedure for deciding whether a formula is well formed or not
3 A set of axioms or axiom schemata each axiom must be a wff4 A set of inference rules
Components of a Formal System-
A formal system has the following components
A finite alphabet of symbols The alphabet must be finite because if it were not each symbol could stand for any thought and so what kind of a model would that be There would be no need to process anything Furthermore we are finite beings and lets keep in mind what we are trying to model
A syntax that defines which strings of symbol are in the language of our formal system A decidable set of axioms and a finite set of rules from which the set of theorems of the system is
generated The rules must take a finite number of steps to apply
Formal Language-
A formal language is a language that is defined by precise mathematical or machine processable formulas Like languages in linguistics formal languages generally have two aspects
the syntax of a language is what the language looks like (more formally the set of possible expressions that are valid utterances in the language)
the semantics of a language are what the utterances of the language mean (which is formalized in various ways depending on the type of language in question
Formal grammar-
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 57
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
A formal grammar is a precise description of a formal language a set of strings The two main categories of formal grammar are that of generative grammars which are sets of rules for how strings in a language can be generated and that of analytic grammars which are sets of rules for how a string can be analyzed to determine whether it is a member of the language
Rule of Inference
A rule of inference inference rule or transformation rule is the act of drawing a conclusion based on the form of premises interpreted as a function which takes premises analyzes their syntax and returns a conclusion
Examples of Formal system-
Set theory Boolean algebra Propositional and predicate calculus Backus Normal form etc
Grammar-
It is often convenient to specify languages in terms of grammars
Lecture_no35 Formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 58
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
formal specifications are mathematically based techniques whose purpose are to help with the implementation of systems and software They are used to describe a system to analyze its behavior and to aid in its design by verifying key properties of interest through rigorous and effective reasoning tools
Approaching a formalism-
DEFINITION 1
-gtAn alphabet T is a finite set of symbols(Terminal symbols)
-gtA formula (also called a string or a sentence)is a concatenation of symbols
DEFINITION 2-
A Language L is a subset of the set of finite concatenations of symbols in an alphabet T
We write this Lsub T
Lecture_no36 Development of formal specification
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 59
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
To construct a Grammar G3 to describe English sentences one might let the alphabet T comprise all English words rather than lettersV would contain non terminals that correspond to the structural components in an English sentence
Production Rule-
A Rule of the form s -gt α where s is symbol α is a sequence of symbols is called a Production Rule
Non-terminal symbol- Nonterminal symbols are those symbols which can be replaced They may also be called simply syntactic variables A formal grammar includes a start symbol a designated member of the set of non-terminals from which all the strings in the language may be derived by successive applications of the production rules
Terminal Symbols-
Terminal symbols are literal symbols which may appear in the inputs to or outputs from the production rules of a formal grammar and which cannot be changed using the rules of the grammar (this is the reason for the name terminal)
Lecture_no37 Formal grammars derivation of sentences
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 60
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Formal Definition
bullThe Derivation of sentences-
A grammar is a finite object that can describe an infinite language
DEFINITION
A string γ is immediately derived from a string micro in a grammar G
micro -gt G γ is a single step derivation if and only if
(1) micro = σ α τ (2) γ = σ β τ(3) α -gt β
ϵ P of G
where σ and τ represent arbitrary strings
Sentential forms and sentences
DEFINITION-
A sentential form is any string which can be derived from the starting symbol sum
bullChomsky Hierarchy-
Hierarchy of Grammars
The following classes of grammars are obtained by gradually increasing the restrictions that the production rules have to obey
A Type 1 grammar is a Type 0 grammar ltN P Sgt that satisfies the following two conditions
a Each production rule in P satisfies | | | | if it is not of the form S b If S is in P then S does not appear in the right-hand side of any production rule
A language is said to be a Type 1 language if there exists a Type 1 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 61
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Example 1215 The grammar of Example is not a Type 1 grammar because it does not satisfy condition (b) The grammar can be modified to be of Type 1 by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition to the modified grammar of a production rule of the form Bb b will result in a non-Type 1 grammar because of a violation to condition (a)
A Type 2 grammar is a Type 1 grammar in which each production rule satisfies | | = 1 that is is a nonterminal symbol A language is said to be a Type 2 language if there exists a Type 2 grammar that generates the language
Example 1216 The grammar of Example is not a Type 1 grammar and therefore also not a Type 2 grammar The grammar can be modified to be a Type 2 grammar by replacing its production rules with the following ones E is assumed to be a new nonterminal symbol
An addition of a production rule of the form aE EaE to the grammar will result in a non-Type 2 grammar
A Type 3 grammar is a Type 2 grammar ltN P Sgt in which each of the production rules which is not of the form S satisfies one of the following conditions
a is a terminal symbol b is a terminal symbol followed by a nonterminal symbol
A language is said to be a Type 3 language if there exists a Type 3 grammar that generates the language
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 62
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Example 1217 The grammar ltN P Sgt which has the following production rules is a Type 3
An addition of a production rule of the form A Ba or of the form B bb to the grammar will result in a non-Type 3 grammar
Below Figure illustrates the hierarchy of the different types of grammars
Figure Hierarchy of grammars
Lecture_no38 context sensitive vs context free grammar
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 63
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Context-free grammars and languagesA grammar G = (V T S P) is a context free grammar (CFG) if all productions in P have the form A x where A V and x (V T) A language is called context-free if it is generated by a context-free grammar
Theorem A language is context-free if and only if it is recognized by a (nondeterministic) pushdown automaton
Deterministic pushdown automata are somewhat less expressive
Context-sensitive grammars and languages
A context-sensitive grammar is one whose productions are all of the form xAy xvy where A V and x v y (V T)
A context-sensitive language is a language generated by a context-sensitive grammar
The name context-sensitive comes from the fact that the actual string modification is given by A v while the x and y provide the context in which the rule may be applied
A Turing machine has an infinite supply of blank tape A linear-bounded automaton (LBA) is a Turing machine whose tape is only kn squares long where n is the length of the input string and k is a constant associated with the particular linear-bounded automaton
Some textbooks define an LBA to use only the portion of the tape that is occupied by the input that is k=1 in all cases That definition leads to an equivalent class of machines because we can compensate for the shorter tape by having a larger tape alphabet
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 64
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Context-free grammar Vs Context-sensitive grammar
In formal language theory a context-free grammar (CFG) is a grammar in which every production rule is of the form
V rarr w
where V is a single nonterminal symbol and w is a string of terminals andor nonterminal (possibly empty) The term context-free expresses the fact that nonterminals can be rewritten without regard to the context in which they occur A formal language is context-free if some context-free grammar generates it
Context-free grammars play a central role in the description and design of programming languages and compilers They are also used for analyzing the syntax of natural languages
A context-sensitive grammar (CSG) is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols Context-sensitive grammars are more general than context-free grammars but still orderly enough to be parsed by a linear bounded automaton
The concept of context-sensitive grammar was introduced by Noam Chomsky in the 1950s as a way to describe the syntax of natural language where it is indeed often the case that a word may or may not be appropriate in a certain place depending upon the context A formal language that can be described by a context-sensitive grammar is called a context-sensitive language
Lecture_no 39 Backus-Naur Form(BNF)
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 65
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
In computer science BNF (Backus Normal Form or BackusndashNaur Form) is a notation technique for context-free grammars often used to describe the syntax of languages used in computing such as computer programming languages document formats instruction sets and communication protocols It is applied wherever exact descriptions of languages are needed for instance in official language specifications in manuals and in textbooks on programming language theory
Introduction to BNF-
A BNF specification is a set of derivation rules written as
ltsymbolgt = __expression__
where ltsymbolgt is a nonterminal and the __expression_consists of one or more sequences of symbols more sequences are separated by the vertical bar | indicating a choice the whole being a possible substitution for the symbol on the left Symbols that never appear on a left side are terminals On the other hand symbols that appear on a left side are non-terminals and are always enclosed between the pair ltgt
The = means that the symbol on the left must be replaced with the expression on the right
Formal Grammar-
In formal language theory a grammar (when the context isnt given often called a formal grammar for clarity) is a set of formation rules for strings in a formal language The rules describe how to form strings from the languages alphabet that are valid according to the languages syntax A grammar does not describe the meaning of the strings or what can be done with them in whatever contextmdashonly their form
Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 66
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no 40 Canonic System
Canonic systems which are applied variants of Posts canonical systems have been used to define sets of strings of symbols recursively A hierarchy of classes of canonic systems is obtained by introducing restrictions on the general form of canonic systems One of the classes contains canonic systems which generate only recursive sets including all context sensitive languages An algorithm has been programmed in APL to parse strings over the alphabet of languages specified by canonic systems of the class mentioned above A general approach has been taken such that the canonic systems and the parsing algorithm cover a wide variety of applications as eg in constructing syntax directed translators proof checking systems and linguistical analysis Some mathematical properties of the classes of canonic systems are also presented
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 67
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no41 Examples on Canonic system
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 68
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no 42 Recognition and Translation Algorithm
Canonic systems will prove very useful in explicitly and concisely defining sets of strings such as computer languageIf a canonic system could be used as a basis for recognizing strings from the defined sets
This algorithm is capable of recognizing strings produced by Backus-Naur system
In the case of a canonic system where all predicates are of degree one and no ldquocross-referencingrdquo is used
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 69
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no 43 Compiler
Compiler are basically ldquoLanguage Translatorrdquo
ndash Language translators switch texts from one language into another making sure that the translated version conforms to the grammar and style rules of the target language
-- Compiler is a program which takes one language (source program) as input and converts into an equivalent another language (target program)
Source program---------gt Compiler -------------gt Target program
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 70
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Computer Analysis Synthesis Model -
In Analysis Part the source program is read and then it is broken into stream of strings In synthesis part ndash This intermediate form of the source language is taken and converted into an
equivalent target program
-gtAnalysis Partbull The analysis part is carried out in three sub-parts
ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
string
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 71
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
ndash Semantic Analysis bull In this step the meaning of the source string is determined
Properties of Compiler-
bull It must be bug free
bull It must generate correct machine code
bull The generated machine code must run fast
bull The compiler itself must run fast (compilation time must be proportional to program size)
bull The compiler must be portable (ie modular supporting separate compilation)
bull It must give good diagnostics and error messages
bull The generated code must work well with existing debuggers
Phases of a compiler-
ndash Lexical Analysis
bull It is also called scanning
bull It breaks the complete source code into tokens
For example total = count + rate 10
bull Then in lexical analysis phase this statement is broken up into series of tokens as follows
ndash The identifier total
ndash The assignment symbol
ndash The identifier count
ndash The plus sign
ndash The identifier rate
ndash The multiplication sign
ndash The constant number 10
- Syntax Analysis
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 72
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
bull It is also called parsing
bull In this phase the tokens generated by lexical analyzer are grouped together to form a hierarchical structure
bull It determines the structure of the source string by grouping the token together
bull The hierarchical structure generated in this phase is called parse tree or syntax tree
Lecture_no 44 functions of each phase of a compiler(partI)
The Structure of a Compiler
Modern compilers contain two (large) parts each of which is often subdivided These two parts are the front end shown in green on the right and the back end shown in pink
The front end analyzes the source program determines its constituent parts and constructs an intermediate representation of the program Typically the front end is independent of the target language
The back end synthesizes the target program from the intermediate representation produced by the front end Typically the back end is independent of the source language
This frontback division very much reduces the work for a compiling system that can handle several (N) source languages and several (M) target languages Instead of NM compilers we need N front ends and M back ends For gcc (originally standing for Gnu C Compiler but now standing for Gnu Compiler Collection) N=7 and M~30 so the savings is considerable
Multiple Phases
The front and back end are themselves each divided into multiple phases The input to each phase is the output of the previous Sometime a phase changes the representation of the input For example the lexical analyzer converts a character stream input into a token stream output Sometimes the representation is unchanged For example the machine-dependent optimizer transforms target-machine code into (hopefully improved) target-machine code
The diagram is definitely not drawn to scale in terms of effort or lines of code In practice the optimizers especially the machine-dependent one dominate
Conceptually there are three phases of analysis with the output of one phase the input of the next The phases are called lexical analysis or scanning syntax analysis or parsing and semantic analysis
Lexical Analysis (or Scanning)
The character stream input is grouped into meaningful units called lexemes which are then mapped into tokens
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 73
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
the latter constituting the output of the lexical analyzer For example any one of the following
x3 = y + 3 x3 = y + 3 x3 =y+ 3 but not
x 3 = y + 3would be grouped into the lexemes x3 = y + 3 and
A token is a lttoken-nameattribute-valuegt pair For example
1 The lexeme x3 would be mapped to a token such as ltid1gt The name id is short for identifier The value 1 is the index of the entry for x3 in the symbol table produced by the compiler This table is used to pass information to subsequent phases
2 The lexeme = would be mapped to the token lt=gt In reality it is probably mapped to a pair whose second component is ignored The point is that there are many different identifiers so we need the second component but there is only one assignment symbol =
3 The lexeme y is mapped to the token ltid2gt 4 The lexeme + is mapped to the token lt+gt 5 The lexeme 3 is somewhat interesting and is discussed further in subsequent chapters It is mapped to
ltnumbersomethinggt but what is the something On the one hand there is only one 3 so we could just use the token ltnumber3gt However there can be a difference between how this should be printed (eg in an error message produced by subsequent phases) and how it should be stored (fixed vs float vs double) Perhaps the token should point to the symbol table where an entry for this kind of 3 is stored Another possibility is to have a separate numbers table
6 The lexeme is mapped to the token ltgt
Note that non-significant blanks are normally removed during scanning In C most blanks are non-significant Blanks inside strings are an exception
Note that we can define identifiers numbers and the various symbols and punctuation without using recursion (compare with parsing below)
Syntax Analysis (or Parsing)
Parsing involves a further grouping in which tokens are grouped into grammatical phrases which are often represented in a parse tree For example
x3 = y + 3 would be parsed into the tree on the right
This parsing would result from a grammar containing rules such as
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 74
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
asst-stmt rarr id = expr expr rarr number | id | expr + expr
Note the recursive definition of expression (expr) Note also the hierarchical decomposition in the figure on the right
The division between scanning and parsing is somewhat arbitrary but invariably if a recursive definition is involved it is considered parsing not scanning
Often we utilize a simpler tree called the syntax tree with operators as interior nodes and operands as the children of the operator The syntax tree on the right corresponds to the parse tree above it
(Technical point) The syntax tree represents an assignment expression not an assignment statement In C an assignment statement includes the trailing semicolon That is in C (unlike in Algol) the semicolon is a statement terminator not a statement separator
Semantic Analysis
There is more to a front end than simply syntax The compiler needs semantic information eg the types (integer real pointer to array of integers etc) of the objects involved This enables checking for semantic errors and inserting type conversion where necessary
For example if y was declared to be a real and x3 an integer we need to insert (unary ie one operand) conversion operators ldquoint to realrdquo and ldquoreal to intrdquo as shown on the right
Intermediate code generation
Many compilers first generate code for an ldquoidealized machinerdquo For example the intermediate code generated would assume that the target has an unlimited number of registers and that any register can be used for any operation Another common assumption is that machine operations take (up to) three operands two source and one target
With these assumptions one generates three-address code by walking the semantic tree Our example C instruction would produce
temp1 = inttoreal(3)temp2 = id2 + temp1temp3 = realtoint(temp2)id1 = temp3
We see that three-address code can include instructions with fewer than 3 operands
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 75
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Sometimes three-address code is called quadruples because one can view the previous code sequence as
inttoreal temp1 3 --add temp2 id2 temp1realtoint temp3 temp2 --assign id1 temp3 --Each ldquoquadrdquo has the form
operation target source1 source2
Code optimization
This is a very serious subject one that we will not really do justice to in this introductory course Some optimizations are fairly easy to see
1 Since 3 is a constant the compiler can perform the int to real conversion and replace the first two quads with
2 add temp2 id2 30
3 The last two quads can be combined into 4 realtoint id1 temp2
Code generation
Modern processors have only a limited number of register Although some processors such as the x86 can perform operations directly on memory locations we will for now assume only register operations Some processors (eg the MIPS architecture) use three-address instructions However some processors permit only two addresses the result overwrites one of the sources With these assumptions code something like the following would be produced for our example after first assigning memory locations to id1 and id2
LD R1 id2 ADDF R1 R1 30 add float RTOI R2 R1 real to int ST id1 R2
127 Symbol-Table Management
The symbol table stores information about program variables that will be used across phases Typically this includes type information and storage location
A possible point of confusion the storage location does not give the location where the compiler has stored the variable Instead it gives the location where the compiled program will store the variable
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 76
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Lecture_no45 Phases of a compiler(part II)
The Grouping of Phases into Passes
Logically each phase is viewed as a separate program that reads input and produces output for the next phase ie a pipeline In practice some phases are combined into a pass
For example one could have the entire front end as one pass
The term pass is used to indicate that the entire input is read during this activity So two passes means that the input is read twice We have discussed two pass approaches for both assemblers and linkers If we implement each phase separately and use multiple passes for some of them the compiler will perform a large number of IO operations an expensive undertaking
As a result techniques have been developed to reduce the number of passes We will see in the next chapter how to combine the scanner parser and semantic analyzer into one phase Consider the parser When it needs another token rather than reading the input file (presumably produced by the scanner) the parser calls the
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 77
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
scanner instead At selected points during the production of the syntax tree the parser calls the intermediate-code generator which performs semantic analysis as well as generating a portion of the intermediate code
Compilers
1 Lexical Analysis (Lexical Analyzer Scanner)Reads the source program one character at a time carving the source program into a sequence of atomic units
called token (token type token value)
2 Syntax Analysis (Syntax Analyzer Parser)The grammar specifies the form or syntax of legal statements in the language
3 Code OptimalizationImprove the intermediate code so that the ultimate object program run fast andor takes less space
4 Code GenerationAllocate memory location select machine code for each imtermediate code utilize registers as efficiently as
possible
5 GrammarBackus Naur Form grammar consists of a set of rules each which defines the syntax of some construct in the
programming language
6 Parse TreeIt is often convenient to display the analysis of source statement in terms of a grammar as a tree (Syntax Tree)
7 Ambiguous GrammarThere is more than one possible parse tree for a given statement
Lecture_no 46 Passes of a compiler
Compiler Passes-bull A ldquopassrdquo is a complete traversal of the source program or a complete traversal of some internal representation of the source program (such as an AST)bull A pass can correspond to a ldquophaserdquo but it does not have tobull Sometimes a single pass corresponds to several phases that are interleaved in timebull What and how many passes a compiler does over the source program is an important design decision
(1)Single Pass CompilerA single pass compiler makes a single pass over the source text parsing analyzing and generating code all at onceDependency diagram of a typical Single Pass Compiler
Compiler Driver calls
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 78
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
Syntactic Analyzercalls calls
Contextual Analyzer Code Generator
(2)Multi Pass CompilerA multi pass compiler makes several passes over the program The output of a preceding phase is stored in a data structure and used by subsequent phasesDependency diagram of a typical Multi Pass Compiler
Compiler Driver
calls calls calls
Syntactic Analyzer Contextual Analyzer Code GeneratorI
I O I O O
Source Text AST Decorated AST object code
Classifying compilers by number of passes has its background in the hardware resource limitations of computers
Compiling involves performing lots of work and early computers did not have enough memory to contain one program
that did all of this work So compilers were split up into smaller programs which each made a pass over the source (or
some representation of it) performing some of the required analysis and translations
The ability to compile in a single pass is often seen as a benefit because it simplifies the job of writing a compiler and one
pass compilers generally compile faster than multi-pass compilers Many languages were designed so that they could be
compiled in a single pass (eg Pascal)
Lecture_no47 Surprise Test
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 79
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-
SP NOTES
THANK YOU
Mrs Nirjharinee Parida Sr Lecturer(SIETDhenkanal) Page 80
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- PREREQUISITES
- One high level procedural language knowledge to assembly language and knowledge of data structures and computer organization
- In order to understand system software basic knowledge of the notion of a system number representation and coding schemes is essential In addition some awareness about the components of a program programming techniques data and file structure and writing and analyzing algorithms is also necessary for a formal understanding of system programming Moreover the ability to apply logic and common sense is considered important for the development of system software
- AIM
- To have an understanding of foundations of design of assemblers loaders linkers macro processors and compiler
- OBJECTIVES
- To understand the relationship between system software and machine architecture
- To know the design and implementation of Assemblers
- To know the design and implementation of Linkers and Loaders
- To have an understanding of macroprocessors
- To know the function of Compiler
- EXPECTED OUTCOME
- This course provides knowledge to design various system programs
- Assembler
- Compiler
- Interpreter
- One-Pass Assembler
- The translation performed by an assembler is essentially a collection of substitutions machine operation code
- for mnemonic machine address for symbolic machine encoding of a number for its character representation etc Except for one factor these substitutions could all be performed in one sequential pass over the source text That factor is the forward reference (reference to an instruction which has not yet been scanned by an assembler) The separate passes of the two pass assemblers are required to handle forward references without restriction If certain limitations are imposed however it becomes possible to handle forward references without making two passes Different sets of restrictions lead to the one pass assembler These one- pass assemblers are particularly attractive when secondary storage is either slow or missing entirely as on many small machines
- MACRO PROCESSOR
-
- Macro Definition and Usage
-
- Example
- Defining a Macro
- Linking
- Relocation
-
- High-level language(HLL)
- Sorting and Searching
-
- Objectives
-
- Sorting
-
- Context-free grammars and languages
- Context-sensitive grammars and languages
- Context-free grammar Vs Context-sensitive grammar
-
- Formal language theory the discipline which studies formal grammars and languages is a branch of applied mathematics Its applications are found in theoretical computer science theoretical linguistics formal semantics mathematical logic and other areas
-
- ndash Lexical Analysis bull In this part the source program is read and then it is broken into stream of strings Such strings are called tokens (tokens are collection of characters having some meaning) ndash Syntax Analysis bull In this step the tokens are arranged in hierarchical structure that ultimately helps in finding the syntax of the source
- string
- ndash Semantic Analysis bull In this step the meaning of the source string is determined
-
- The Structure of a Compiler
-
- Multiple Phases
- Lexical Analysis (or Scanning)
- Syntax Analysis (or Parsing)
- Semantic Analysis
- Intermediate code generation
- Code optimization
- Code generation
- 127 Symbol-Table Management
- The Grouping of Phases into Passes
-