study material - · pdf file · 2016-05-15cs6303 – computer architecture ......

CS6303 – COMPUTER ARCHITECTURE

“Chase Your Dreams; Dreams Do Come True” – Sachin Tendulkar

VI SEMESTER

PREPARED BY V.BALAMURUGAN, ASST.PROF/IT

STUDY MATERIAL

ANNA UNIVERSITY

REGULATION 2013

CS6303

COMPUTER ARCHITECTURE



UNIT 1 – OVERVIEW AND INSTRUCTIONS PART – A

1. List the eight ideas invented by computer architects

Design for Moore’s Law

Use Abstraction to simplify Design

Make the common case fast

Performance via Parallelism

Performance via Pipelining

Performance via Prediction

Hierarchy of memories

Dependability via redundancy 2. What is pipelining? Pipelining is a set of data processing elements connected in series, where output of one element is the input of next element. 3. What are the major hardware components?

Input Unit (Keyboard, Mouse, etc.)

CPU (Memory Unit, ALU, Control Unit)

Output Unit (monitor, printer, speaker, etc.) 4. What is CPU and ALU? CPU: Central Processing Unit It is also called as brain of the computer Input and Output devices work according to the CPU ALU:- Arithmetic Logic Unit

It performs arithmetic and Logical operations

It is present inside the CPU

It uses main memory for operations (RAM) 5. What is control unit?

It is present inside the CPU

It controls the operation of input unit, output unit and ALU

It has the overall control of the computer

It tells memory unit to send/receive data

It tells ALU what operation to perform 6. What is Response Time and Throughput?

the time between the starting and ending of a task is called as response time. It is also called as “execution time”.

The total amount of work done in the given time is called as Throughput

7. What is CPU time?

Amount of time, the CPU spends for doing a task is called as CPU time

It is also called as CPU execution time

Time of waiting for I/O is not included.

𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 =CPU time spent in program (user CPU time)

𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑠𝑝𝑒𝑛𝑡 𝑖𝑛 𝑂𝑆 (𝑠𝑦𝑠𝑡𝑒𝑚 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒)

8. Write down the formula for power consumed by CPU. The formula for finding the power consumed by CPU is,

P = C V2 f Where,P=power, C=Capacitive loading, V=Voltage, f= frequency 9._What are multiprocessor systems? Give its advantages.

Computer systems that contain more than one processor are called as Multiprocessor systems

They execute more than one applications in parallel.

They are also called as shared memory multiprocessor s/m

High performance, high cost, high complexity. Advantages:-

Improved cost-performance ratio

High speed processing

If one processor gets failed, other processors will continue work.

10. _What is an instruction and instruction set?

Instruction is a part of code, written in step-by-step procedure for the CPU to complete a certain task

Instruction set is a set of instructions to be executed by the compiler, that contains all the details of the tasks

Ex for Instruction set:- Arithmetic instructions Ex: (ADD, SUB) Logic instructions Ex: (AND, OR, NOT) Data transfer instructions Ex: (MOVE, LOAD, STORE) Control flow instructions Ex: (GOTO, CALL, RETURN)

11. What is instruction format?

The format in which the instructions are written is called as Instruction format

Each instruction has three fields. OPCODE It specifies which operation is to be performed. MODE It specifies how to find effective address. ADDRESSIt specifies the address in memory/register

OPCODE MODE ADDRESS

12. What are the different logical instructions?

INSTRUCTION EXAMPLE Equivalent to

AND AND $1, $2, $3 $1 = $2 & $3

OR OR $1, $2, $3 $1 = $2 | $3

NOR NOR $1, $2, $3 $1 = ~ ($2 | $3)

ANDI AND $1, $2, imme $1 = $2 & imme

ORI OR $1, $2, imme $1 = $2 | imme

SHIFT LEFT SLL $1, $2, 10 $1 = $2 << 10

SHIFT RIGHT SRL $1, $2, 10 $1 = $2 >> 10

SHIRT RIGHT ARITHMETIC

SRA $1, $2, 10 $1 = $2 >> 10

13. Write the different control operations. Conditional branch:- BEQ instruction Branch on EQual ( BEQ $s, $t, offset) BNE instruction Branch on Not Equal (BNE $s, $t, offset) Unconditional branch:- J instruction Jump (J target) JAL instruction Jump And Link (JAL target) JR instruction Jump Register ( JR $s)

14. What is PC-relative addressing?

It is also called as Program Counter Addressing

The address of the Data or Instruction is specified as an offset, relative to the incremented Program counter.

It is used in conditional branches

Offset value can be direct or indirect value

Operand address = PC + Offset

Ex: BEQZ $t0, strEnd BEQZ = Branch if EQual to Zero

15. State Moore’s law

“Number of transistors per square inch, on Integrated Circuits (IC) had doubled every year since the IC was invented”

“Computer architects should concentrate on, after the design is finished, where the technology will be. Don’t bother where it has started”.

16. State Amdahl’s law

Improve the performance of common case, optimize rare case.

Use common case for designing non-frequent cases.

This method makes simple design and faster.

This idea is called as Amdahl’s law.

U

N

I

T

1



PART – B 1. Explain the eight ideas invented by the architect for designing the computer system.

Design for Moore’s Law

Use Abstraction to simplify Design

Make the common case fast

Performance via Parallelism

Performance via Pipelining

Performance via Prediction

Hierarchy of memories

Dependability via redundancy

Design for Moore’s law:-

It is developed by Gordon Moore

He is the Co-founder of Intel

“Number of transistors per square inch, on Integrated Circuits (IC) had doubled every year since the IC was invented”

“Computer architects should concentrate on, after the design is finished, where the technology will be. Don’t bother where it has started”.

Moore’s law graph:- o “Up and to the Right” o Represent designing for quick change

Use abstraction to simplify the design:-

Abstraction means, hiding some part of the component.

It is used by the programmers and architects

It is used to represent the design at many levels.

At each level, its low level details are hidden.

This concept improves productivity

It simplifies the design

It reduces time for designing

Ex:- o In OS, I/O management details are hidden o In high level languages, sequence of

instructions are hidden.

Make the common case fast:-

Many big improvements in the performance of a computer come from the improvements in common case

Improve the performance of common case, optimize rare case.

Use common case for designing non-frequent cases.

This method makes simple design and faster.

This idea is called as Amdahl’s law.

Ex:- o It is easy to design a sports car from ordinary

car. o But it is not possible to design a sports car

from Van.

Performance via parallelism:-

Computer architects improves the performance by performing the operations in parallel.

A processor handles several activities simultaneously in the execution of an instruction.

Advantage: Faster performance of CPU.

Performance via pipelining:-

It is the extended concept of parallelism.

Pipeline is a set of data processing elements connected in series, where the output of one element is the input for the next element.

Elements that are independent from pipeline are executed in parallel to improve their performance.

Performance via prediction:-

Nowadays, processors reduce the bad effects of branches.

Predict the output of a condition

Test and start executing the indicated instruction

It is better than waiting for a correct answer.

If the prediction is accurate, performance is improved

Hierarchy of memories:-

Programmers need the memory to be fast, large and cheap.

Cache is a small memory for having recently used data.

Memory hierarchy used:- Top – to - bottom

Speed Fast to slow Top

Bottom Cost Expensive

Size Smallest to largest

In-board storage

Registers

Cache

Main Memory

Out-board storage

Magnetic disk (CD, DVD,

BluRay)

Off – line storage

Magnetic tapes

Dependability via redundancy:-

This idea is expensive

It uses RAID concept

RAID – Redundancy Array Inexpensive Disk

In RAID, data is stored redundantly on multiple disks

If one disk fails, other systems will continue working 2. State the CPU performance equation and discuss the factors that affect performance. CPU performance equation:-

CPU performance equation is defined as the ratio of product of (Instruction Count and no. of steps to execute one instruction) to the clock rate

T = N x S R

Where, T = CPU Execution time / Program Execution Time

N = No. of instructions

S = No. of steps to execute one instruction

R = Clock rate

Small, fast,

costly

Large, Slow,

Cheap



Speed:-

To measure the performance of the computer It is used to measure how quickly a computer executes

programs.

Response time:-

Time between the starting and ending of a task

It is measured in seconds per program.

It includes disk access, memory access and I/O activities.

It is also called as execution time, wall-clock time, elapsed time

Throughput:-

Total amount of work done in a given time.

If response time is decreased, throughput is decreased. Increase the speed:-

To increase the speed of a computer:- o Decrease the response time o Increase the throughput

To decrease response time and increase throughput:- o Use the faster version of processors o Add extra number of processors

Relation between performance and execution time:- Performance = ______1______ Execution time

Let X and Y be two different computers, Performance x > Performance y

____1______ > ______1_______ Execution time x Execution time Y

Execution time Y > Execution time X

CPU time:- Amount of time, CPU spends for doing a task is called as CPU

time

It is also called as CPU execution time

Time of waiting for I/O is not included.

𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 =CPU time spent in program (user CPU time)

𝐶𝑃𝑈 𝑡𝑖𝑚𝑒 𝑠𝑝𝑒𝑛𝑡 𝑖𝑛 𝑂𝑆 (𝑠𝑦𝑠𝑡𝑒𝑚 𝐶𝑃𝑈 𝑡𝑖𝑚𝑒)

Ex:- Given:- User CPU time = 90.7 sec System CPU time = 12.9 sec Elapsed time = 2 min 39 sec ( 159 sec) Find:- CPU time = ___90.7 + 12.9__ = 0.65 159

Performance equation – 1

CPU execution time = CPU clock cycles X clock cycle time

Clock rate is inverse of clock cycle time

Clock rate =______1______ Clock cycle time

CPU execution time = ____CPU clock cycles___ Clock rate

Performance can be improved by reducing length of the clock cycle or number of clock cycles

Execution time depends upon number of instructions in a program.

CPU clock cycles = Instructions X average clock cycles per instruction

CPI = average no. of clock cycles taken by each instruction for execution. Where, CPI = Clock cycles Per Instruction

Performance equation – 2 CPU execution time = Instruction Count x CPI x Clock cycle time CPU execution time = ____IC x CPI___ Clock rate Power equation:-

If performance is increased, clock speed is also increased.

If clock speed is increased, heat is also increased.

If heat is increased, power consumption is also increased.

Formula for power consumed by CPU:- P = C V2 f

Where=power, C=Capacitive loading, V=Voltage, f= frequency 3. What is instruction format? What are the types of instructions available?

The format in which the instructions are written is called as Instruction format

Some specific rules has to be followed while writing the instructions.

Each instruction has three fields. OPCODE It specifies which operation is to be performed. MODE It specifies how to find effective address. ADDRESSIt specifies the address in memory/register

OPCODE MODE ADDRESS

Types of instructions:-

Three address instructions

Two address instructions

One address instructions

Zero address instructions Three address instructions:-

Three Addresses of three registers are mentioned

Bits are required to specify three addresses of three operands

Bits are required to specify the operation

Syntax: Operation Destination, source1, source2.

Ex: ADD A, B, C

This Instruction adds B+C and stores in A ( A B + C)

Where, ADD operation, A destination, B,C Source

More execution time taken because of three addresses. Two address instructions:-

Two Addresses of two registers are mentioned

Bits are required to specify two addresses of two operands


Syntax: Operation Destination, source.

Ex: ADD A, B

This Instruction adds A+B and stores in A ( A A + B)

Where, ADD operation, A destination, B Source

Less execution time than three address instructions. One address instructions:-

One Address of one register is mentioned

Bits are required to specify one address of one operand.


Syntax: Operation Destination (or) Operation Source

Lesser execution time than two address instructions Ex: ADD A

This instruction adds the contents of Register A and Accumulator(A A + AC)

Where, ADD operation, A destination

operand acts as Destination

Ex: LOAD A

This instruction loads the contents of Register A into Accumulator(AC A)

Where, ADD operation, A source

operand acts as source



Zero address instruction:- It contains no address fields

Source and destination operands are mentioned implicitly

Absolute address of the operand in a special register is automatically incremented of decremented

Top of the Pushdown stack is pointed. Bits are required to represent operation only. Syntax : Operation

Ex: ADD

Very less execution time than one address instructions.

4. What is logical instruction? Explain some logical instructions with examples each. Instructions that perform logical operations which manipulate

Boolean values are called as Logical instructions.

INSTRUCTION EXAMPLE Equivalent to

AND AND $1, $2, $3 $1 = $2 & $3

OR OR $1, $2, $3 $1 = $2 | $3

NOR NOR $1, $2, $3 $1 = ~ ($2 | $3)

ANDI AND $1, $2, imme $1 = $2 & imme

ORI OR $1, $2, imme $1 = $2 | imme

SHIFT LEFT SLL $1, $2, 10 $1 = $2 << 10

SHIFT RIGHT SRL $1, $2, 10 $1 = $2 >> 10 shirt right arithmetic SRA $1, $2, 10 $1 = $2 >> 10

AND instruction:- It contains three register operands

Syntax: Operation, destination, source1, source2

Ex: AND $1, $2, $3

It performs Bitwise-AND operation between source1 and source2 and stores the result in Destination.

It is equivalent to $1 = $2 & $3

OR instruction:- It contains three register operands


Ex: OR $1, $2, $3

It performs Bitwise-OR operation between source1 and source2 and stores the result in Destination.

It is equivalent to $1 = $2 | $3

NOR instruction:- It contains three register operands


Ex: NOR $1, $2, $3

It performs Bitwise-NOR operation between source1 and source2 and stores the result in Destination.

It is equivalent to $1 = ~($2 | $3)

ANDI instruction:- It contains three register operands


Ex: ANDI $1, $2, imme

It performs Bitwise-AND operation between source registers and immediate values.

It stores the result in Destination register. It is equivalent to $1 = $2 & imme

ORI instruction:- It contains three register operands


Ex: ORI $1, $2, imme

It performs Bitwise-OR operation between source registers and immediate values.

It stores the result in Destination register. It is equivalent to $1 = $2 | imme

Shift Left Logical Instruction:- It contains two register operands

Syntax: Operation, destination, source1, constant Ex: SLL $1, $2, 10

It shifts the value of $2 register Left side by 10 places

Extra Zeroes are shifted in. It stores result in destination $1

It is equivalent to $1 = $2 << 10

Shift Right Logical Instruction:- It contains two register operands

Syntax: Operation, destination, source1, constant Ex: SRL $1, $2, 10

It shifts the value of $2 register right side by 10 places

Extra Zeroes are shifted in. It stores result in destination $1

It is equivalent to $1 = $2 >> 10

Shift Right Arithmetic Instruction:- It contains two register operands

Syntax: Operation, destination, source1, constant Ex: SRA $1, $2, 10

It shifts the value of $2 register right side by 10 places

Sign bit is shifted in. It stores result in destination $1

It is equivalent to $1 = $2 >> 10

5. What is addressing mode? Explain the types of addressing modes.

Register addressing mode

Absolute addressing mode (or) Direct mode

Immediate addressing mode

Indirect addressing mode

Index addressing mode

Relative addressing mode

Auto increment mode

Auto decrement mode Register addressing mode:-

It is the simplest addressing mode

Both the operands are registers.

It is much faster than other addressing modes

Because they do not contact any other memory

The name of the register is mentioned in the instruction.

Ex: ADD R1, R2 (R1R1+R2)

Where, ADD operation, R1destination, R2 source

Registers

Absolute addressing mode:-

It is also called as direct addressing mode

Because the address of location of operand is given directly in the instruction.

Ex: MOVE A, 2000

This instruction copies the contents of memory location 2000 into the Register A.

Operand

Opcode Address



Immediate addressing mode:-

The operand is given directly as a numerical value

It doesn’t require any extra memory access to fetch operand

It executes faster (immediate).

Ex: MOVE A, #20

JUMP instruction

The # symbol says that it is an immediate operand.

The value 20 is moved to the Register A

Ex: ADDI $t1, $0, 1

Where ADDI ADD Instruction Immediate $t1 operand $0 Register1, 1 Immediate value.

Indirect mode:-

It is also called as Register Indirect addressing mode.

Here, the address is not given directly

The memory address should be determined from the instruction

These addresses are called as Effective Address (EA)

The effective address of the operand is the contents of a register (or) the main memory location, whose address is given directly in the instruction.

When the effective address of the operand is the contents of a register, it is called as Register addressing Mode.

Ex: MOVE A, (R0)

It copies the contents of memory addressed by the contents of register R0 into the register A

Register given within the parenthesis ( ) is called as Pointer

A R0 2000

Index addressing mode:-

Indexing is a technique that allows programmer to refer the data (operand) stored in memory locations one by one.

In index addressing mode, the Effective address of the operand is generated by adding a constant value to the contents of a register

That constant is specified in the instruction

Ex: MOVE 20(R1), R2

It loads the contents of register R2 into memory location whose address is contents of R1 + 20

Where, MOVE operation, (R1) contents of R1, 20(R1) contents of R1 + offset 20, R2 source operand

R2 R1 1020 Relative addressing mode:-

It is also called as PC-Relative addressing mode

Because Program Counter is used in this mode.

Here, the effective address is calculated by the index mode using the program counter.

It is generally used in branch instructions

Operand address = PC + offset

Ex: BEQZ $t0, END

Where, $t0 Source operand,

Auto Increment mode:-

Here the effective address of the operand is the contents of a register specified in the instruction.

After accessing the operand, the contents of this register are incremented to address the next location.

Ex: MOVE (R2), + R0

The contents of R0 is copied into the memory location whose address is in the register R2

After copying, the contents of register R2 is automatically incremented by 1.

Auto decrement mode:-

The contents of the register is decremented by one and then it is used as the effective address of the operand.

Decrement operation for the register is done first, and then the instruction is continued.

Ex: MOVE R1, -(R0)

The contents of the address contained in R0 is decremented by one, and then it is moved to R1 register.

6. Assume two address format specified as source, destination. Examine the following sequence of instructions and explain the addressing modes used and the operation done in every instruction:- i) MOVE (R5)+, R0 ii) ADD (R5)+, R0 iii) MOVE R0, (R5) iv) MOVE 16(R5), R3 v) ADD #40, R5 Solution:- i) MOVE (R5)+, R0

Addressing mode: Auto increment addressing mode.

This instruction can be split as: MOVE (R5), R0 INCREMENT R5

This is also called as automatic post-increment mode.

Because, the increment is done after the operation

Operation: R5 is source, R0 is destination (given)

R5 contains some memory address. Go to that memory address and fetch the data from there.

MOVE it to Register R0

Then increment R5 ii) ADD (R5)+, R0

Addressing mode: Auto increment addressing mode.

This instruction can be split as: ADD (R5), R0 INCREMENT R5

This is also called as automatic post-increment mode.

Because, the increment is done after the operation

Operation: R5 is source, R0 is destination (given)

R5 contains some memory address. Go to that memory address and fetch the data from there.

ADD it to Register R0, and store in R0.

Then increment R5 iii) MOVE R0, (R5)

Addressing mode: Register indirect addressing mode

Because only registers are used in this instruction

And, one register is given indirectly within parenthesis.

R0 Source, R5 destination

The contents of R0 is moved to the memory location whose address is contained in R5.

10 2000 10

10 1000 10



iv) MOVE 16(R5), R3

Addressing mode: indexed addressing mode

Operation:-

The 16th index position of R5 is moved to R3.

Effective address (EA) = (R5) + 16

R5

1(R5)

2(R5)

.

.

.

16(R5)

.

v) ADD #40, R5

Addressing mode: Immediate addressing mode

Operation:-

The # sign indicates that this is immediate operand

Here source #40, Destination R5 (given)

It Adds the value of operand 40 to the value of Register R5

And stores the result in R5

7. Explain the components of a computer system in detail. HARDWARE COMPONENTS:- The organization of a computer has four major parts. They are:-

o Input Unit

o CPU

o Output Unit

o Memory unit

Input Unit:-

Input devices get the data from the user and converts to

the machine understandable form.

Ex: Keyboard, Mouse, Scanner, Joystick, Light pen,

Card reader, Webcam, Microphone

Keyboard:-

It is a standard input device attached to all type of computers

It contains keys arranged in the form of QWERTY

It contains many keys such as TAB, CAPSLOCK, SPACE BAR, ALT, CTRL, ENTER, HOME, END, etc

It contains 101 to 104 keys

If we press the keys in the keyboard, electrical signals are sent to the computer.

Mouse:-

It is used with personal computer

Old type of mice have magnetic ball at the back.

Nowadays, Infrared mice are used, that works on infrared light at the back of the mouse.

Scanner:-

Keyboard can give input only the characters

Scanners can give a picture as input to the computer

Scanner is an optical device that takes a picture and gives as input to the computer

Ex: MICR, OMR, OCR

CPU:- It is called as brain of the computer

It performs takes such as arithmetic and logical operations

CPU is divided into three parts: ALU, Control Unit, Registers ALU:-

After the system gets input data, it is stored in primary storage.

The actual processing of data takes place at Arithmetic Logic Unit (ALU)

It performs addition, subtraction, multiplication, division, logical comparison, etc.

It also performs AND, OR, NOT, XOR, etc. operations Control Unit:-

It acts like the supervisor of a computer

It controls the overall activities of a computer components

It checks all the operations of a computer are going correctly or not.

It determines how the instructions are executed one by one

It controls all the input and output operations of a computer

For executing an instruction, it performs the following steps:- o Address of the instruction is placed in address bus o Instruction is read from the memory o Instruction is sent for decoding o Data from that address is read from the memory o These data and address are sent to the memory o Again the next instruction is taken from the memory

Registers:-

They are high speed memory units for storing temporary data

they are small in size

It stores data, instruction, address, etc.

ALU is also a register

Types: Accumulator, GPR, SPR( PC, MAR, MBR, IR)

Accumulator: to store the operands before execution. It receives the result of ALU operation

GPR: General Purpose Registers are used to store data and intermediate results

SPR: Special Purpose Registers used for certain purpose.

PC: Program counter, MAR: Memory Address Bus, MBR: Memory Buffer Register

R3

Moved to

Components of a computer

Hardware Software

System

Software

Appln

software

Program

Develop

ment

Envt

Java,

Games,

MS iffice,

etc

I/P

CPU =

ALU, CU

Memory

Unit O/p

Program

RunTime

Envt



Memory Unit:- Primary storage:-

It is a part of CPU

Its storage capacity is limited

It contains magnetic core or semiconductor cells

It is used for temporary storage ROM:-

o Major type of memory in a computer is called ROM o Read Only Memory o It can be read, cannot be written o It is used for storing permanent values o ROM does not gets erased, even after the power is

switched off o They are non-volatile ( information cannot be erased) o We can store the important data into the ROM o Types: PROM, EPROM, EEPROM

RAM:- o They are used for storing programs and data that are

executed o It is different from ROM o Random Access Memory o It can be read and written o It is volatile o When the power is turned off, its contents are erased o It is also called as RWM (Read Write Memory) o It is faster than ROM o Static RAM (SRAM) and Dynamic RAM (DRAM) are

its types o Its cost is high o Its processing speed is also high

Cache memory:- o It is a very small memory used to store intermediate

results and data o It stores the data that are more frequently called. o It is present inside the CPU, near the processor o It is used for the faster execution

Secondary Storage:- o The speed of primary memory is fast, but secondary

memory is slow o But the memory capacity of primary memory is low. So,

secondary memory is used o It contains large memory space o It is also called as additional memory or auxiliary memory o Data is stored in it permanently o Ex: Magnetic tape, Hard disk, Floppy, Optical Disc, etc. Magnetic tape:- It is used for large computers like mainframe computers Large volume of data is stored for a long time It is like a old tape-recorder cassette They are cheap They store data permanently It is compact, low cost, portable, unlimited storage.

Optical Disk:- CD-ROM:

Compact Disk

They are made of reflective material

High power laser beam is passed to store data onto CD

Cost is low, storage capacity is 700MB

It can only be READ, can’t be written

Only a single side can be used for storage

Merits – CD Demerits - CD Large capacity compared to ROM Read only, cannot be updated

Cheaper, light weight Access is slow compared to magnetic disk

Reliable, removable and efficient It needs careful handling, easily get scratched

CD-RW:- o We can read and write data in that CD o Maximum capacity of 700MB o Light weight, reliable, removable, efficient o Lot of spaces wasted on outer tracks

DVD:

Digital Video Disk

It is the improved version of CD

Available in 4.7GB, 8.54GB, 9.4GB, 17.08GB

Both the sides are used for storage

They cannot be scratched or damaged like CD

We can store full movies, OS in one single DVD USB drives:-

They are commonly called as PEN DRIVES

They are removable storage

They are connected to the USB port of a computer

They are fast and portable

They store larger data when compared to CD, DVD (1GB to 64GB pen drives)

Hard disk:- Hard disks are disks that store more data and works faster It can store 10GB to 2TB It consists of platters; 2 heads are there for read and write It is attached to single arm Information in hard disk is stored in tracks

Floppy Disk They can store 1.4MB of data They are 5.25 to 3.5 inch in diameter They are cheap, portable

Output Unit:- It is a medium between computer and the human After the CPU finishes operation, the output is displayed in

the output unit Types of output:- Hardcopy, softcopy Hardcopy: The output that can be seen physically using a

printer is called as hardcopy Softcopy: The electric version of output that is stored in a

computer or a memory card, or a hard disk Ex: Monitor, Printer, Plotter Monitor:-

It is the most popular output device

It is also called as Visual Display Unit (VDU)

It is connected to a computer through a cable called Video cord

LCD: Liquid Crystal Display monitors o Flat screen- Liquid crystals are used for display

CRT: Cathode Ray Tube monitors o They are old-fashioned TV set like monitor

Printer:-

The output of a computer can be printed using Printer, to get the hardcopy

Laser printer, Inkjet printer: Impact printers. They give fast printouts with good quality using LASER

Dot matrix printer: non-impact printers. Their quality is poor, they are used for billing purpose



Plotter:-

They are used for printing graphics

They are used in CAD/CAM

Pen plotters take printout by moving a pen across the paper SOFTWARE COMPONENTS 1.System Software

They are in-built within the computer system

They are essential for a computer to operate.

A computer cannot be run without them

They control and manage the hardware components Software for Program Development Environment:-

Text Editor: To type the program and make changes. Compiler: Converts High level language to machine code

Assembler: Converts Assembly level language to m/c code

Linker: Combines OBJ programs and creates EXE code

Debugger: To clear errors in EXE program

Software for RunTime Environment:- OS: It operates the overall computer system

Loader: Loads the EXE file into memory for execution

Libraries: Precompiled LIB files that are used by other pgrms

Dynamic Linker: Load and Link shared libraries at run time. 2. Application Software:-

They are softwares necessary for problem solving.

Programs such as JAVA, Games, MS-Word, Dictionary, Emulator, etc are the examples.

They are not necessary for a computer to operate.

They are optional; If user wants, he/she can install them.



UNIT 3 – PROCESSOR & CONTROL UNIT

Part – A

1. How the performance of CPU is measured?

Instruction Count: It is determined by Instruction Set Architecture (ISA) and compiler.

Cycles Per Instruction (CPI) and Clock cycle time: It is determined by the CPU hardware

2. Write the basic performance equation of CPU

CPU time = Instruction count X CPI X Clock cycle (or)

CPU time = (Instruction count X CPI) / Clock Rate 3. What is MIPS?

Million Instructions Per Second

It is a metric used to measure CPU performance

It is defined as the ratio of Instruction count to the product

of execution time and 106

MIPS = Instruction count / (Execution time X 106) 4. What are the types of Instruction in MIPS instruction set?

Memory reference instructions : Load word (LW), store word (SW)

Arithmetic logical instructions : ADD, SUB, AND, OR, SLT

Control flow instructions : BEQ, JUMP 5. What are the steps involved in MIPS instruction execution?

Fetch instruction from memory

Decode the instruction

Execute the operation

Access an operand in Data memory

Write result into register 6. What is a data path?

Data path is the pathway that data takes through the CPU

Data travels through data path; control unit regulates it

It consists of functional units that perform ALU operations 7. What is PC?

Program counter is defined as a register which is used to store the address of the instruction in the program being executed

It is a 32 bit register, written automatically after end of clock cycle

No need of a WRITE control signal 8. What is pipelining? Mention its purpose and advantages

Pipelining is defined as the implementation technique in which more than one instruction is overlapped for simultaneous execution

It is used to make the processors fast

It is divided into stages; each stage finishes a part of execution in parallel

All stages are connected one to next one to form a pipe

It increases instruction throughput.

9. What are the stages of MIPS pipeline?

IF : Instruction fetch from memory

ID : Instruction Decode

EX : Execute the operation

MEM : Access the memory for an operand

WB : Write back the results in a register 10. Define hazard. Mention its types.

Any condition that makes pipeline to stall is called HAZARD

It avoids execution of next instruction in the instruction stream Structural hazard: Two instructions use same resource at same time

Data hazard: Data are not available at expected time in pipeline

Control hazard: branch decisions made before branch condition

11. What is data hazard? Mention its types.

Data hazard occurs when data are not available at expected time in a pipeline

Consider two instructions: I1 occurs before I2

RAW: Read After Write: I2 reads before I1 writes it

WAW : Write After Write : I2 writes before I1 writes it

WAR : Write After read: I2 writes before I1 reads it 12. What are the methods to handle Data hazard?

Forwarding: Result is passed forward from a previous instruction to a later instruction

Bypassing: Passing the result by register file to the desired unit

13. What are the methods to handle control hazard?

Stall the pipeline

Predict branch not taken

Predict branch taken

Delayed branch 14. Define an exception with Ex.

It is also called as interrupt

It is defined as an unscheduled event that disturbs the normal execution of the program

Ex:

ADD R1, R2, R1 R1 = R2 + R1

Arithmetic overflow has occurred

U

N

I

T

-

3



Part - B 1. Explain the types of MIPS Instruction format. R-Format:-

Opcode Rs Rt Rd SHAMT FUNCT

31-26 25-21 20-16 15-11 10-6 5-0

Also called as Register format

Because only registers are used

Three register operands : Rs, Rt, Rd

Rs, Rt source register

Rd Destination

SHAMT Shift And Move To

FUNCT ALU functions (ADD, SUB, AND, OR, SLT)

Opcode for R-format = 0

ALU control lines Function

000 AND

001 OR

010 ADD

110 SUB

111 (SLT)Set on Less Than

I-Format:-

Opcode Rs Rt Address

31-26 25-21 20-16 15-0

For Load/Store instructions o For LOAD, Opcode = 35 o For STORE, opcode = 43 o Rs Base register o Rt For load, it is the destination register, for

store, it is the source register o Memory address = base register + 16 bit

address field

For Branch Instructions o For BRANCH, opcode = 4 o Rs, Rt source registers o Target address = PC + (sign-extended 16-bit

offset address << 2 ) J-Format:-

Opcode Address

31-26 25-0

For Jump instructions, Opcode = 2 Destination address = PC [31-28] | | (Offset address < < 2 )

2. Explain Datapath and its control in detail.

(Or) Explain Datapath and its control implementation schemes for MIPS instruction formats with neat diagrams. Data path:-

Data path is the pathway that data takes through the CPU

Data travels through data path; control unit regulates it

It consists of functional units that perform ALU operations Functional Units of data path:-

Instruction memory:-

It is a memory unit to store instructions of a program and supply instructions Program Counter:-

Program counter is defined as a register which is used to store the address of the instruction in the program being executed

Adder:-

Increment the PC to solve next instruction

An ALU is connected to perform addition of its two 32bit inputs, place result on its output Registers:-

It is a structure that contains processor’s 32 GPR

They can be read / Written

It contains 4 inputs (2 read ports + 1 write port + 1 writeData)

It contains 2 outputs (two read data) ALU:-

Input: two 32bits

Output: 32bit result Data memory Unit:-

Input: Address and write data

Output: Read result Sign extension Unit:-

Input: 16bit sign

Output: 32bit extended sign MUX:-

It is also called data selector

It allows multiple connections to the input of an element and have a control signal SELECT among the inputs.

Building a data path:- Fetch instructions:-

To execute any instruction, first the instruction is fetched from memory

To prepare for executing next instruction, PC is incremented by 4 bytes, which points to next instruction

Data path for R-Format instructions:-

Regiter File and ALU are needed additionally with previous components.

ALU gets input from DataRead ports of register File

Register file is written by ALUResult output of ALU with RegWrite signal



Data path for Load/Store instructions:- Data memory unit and Sign extension unit are needed additionally

Three register inputs are read from instruction field

Memory address is calculated based on instruction field For Load, data at memory address is read from data memory

For store, write data is written into data memory Data path for Branch/Jump instructions

Branch target = Incremented PC + sign-extended, lower 16bits of instruction, shift left 2 bits

Compare register contents using ALU

Combining data paths for simple implementation:-

ALU control lines Function

000 AND

001 OR

010 ADD

110 SUB

111 (SLT)Set on Less Than

Opcode

ALUop

Operation

FUNCT

ALU action

ALU control input

LW 00 Load Word

Xxxxxx Add 010

SW 00 Store word

Xxxxxx Add 010

BEQ 01 Branch on Equal

Xxxxxx Sub 110

R-Type 10 ADD 100000 Add 010

R-type 10 SUB 100010 Sub 110

R-Type 10 AND 100100 AND 000

R-Type 10 OR 100101 OR 001

R-Type 10 Set On Less Than

101010 SLT 111

Truth Table for Three ALU control bits:-

ALUop FUNCT field Operation ALUop1 ALUop2 F5 F4 F3 F2 F1 F0

0 0 X X X X X X 0010

X 1 X X X X X X 0110

1 X X X 0 0 0 0 0010

1 X X X 0 0 1 0 0110

1 X X X 0 1 0 0 0000

1 X X X 0 1 0 1 0001

1 X X X 1 0 1 0 0111

Control Signals used in Control implementation:-

Signal When RESET (0) When SET (1)

RegDst Destination Register Number comes from

Rt

Destination Register Number comes from

Rd

RegWrite None Register on Write register input is written

with value of Write Data input

ALUsrc Second ALU operand comes from 2nd register file o/p

2nd ALU operand is sign-extended, lower 16bits of instruction

PCsrc PC replaced by o/p of adder that calculates

PC+4

PC is replaced by o/p of adder that

calculates branch target

MemRead None Data memory contents designated by address input are put on Read

data o/p

MemWrite None Data memory contents designated by address input are replaced by

value on Write data i/p

MemtoReg Value fed to register write data i/p comes

from ALU

Value fed to register write data i/p comes from data memory

t



AL

Uo

p

0 0 0 0 1

AL

Uo

p

1 1 0 0 0

Bra

nc

h

0 0 0 1

Mem

Wri

te

0 0 1 0

Mem

Rea

d

0 1 0 0

Reg

Wri

te

1 1 0 0

Mem

to

Reg

0 1 X

X

AL

Usr

c 0 1 1 0

Reg

Ds

t 1 0 X

X

Inst

ru

ctio

n

R-

form

at

LW

SW

BE

Q

Data path : R-Type:-

Fetch instruction and increment PC

Get operands from register file, based on Src reg num

Perform ALU operation using ALUSrc=0

Select o/p from ALU using MemtoReg=0

WB to destination register(regWrite=1, RegDst=1) Data path : Memory access (Load):-


Get base register operand from reg file

Perform addition of register value with ALUsrc=1

Use ALU result as address for data memory

Use MemtoReg = 1 to select read data and WB to destination register using RegWrite=1, RegDst=0



Data path : memory access (store):-


Get base register and data from register file

Perform addition of register value with ALUsrc=1

Use ALU result as address for data memory

Using MemWrite=1, write data operand to memory address Data path : branch:-


Read 2 registers from register file for comparison

ALU subtracts data values using ALUsrc=0

Generate branch address: add PC+4, shift left by 2

Use zero o/p from ALU to find which result to be used for updating PC

If equal, use branch address

Else, use incremented PC

Data path : Jump:- Shift instruction bits 25-0 left two bits to create 28bit value

Combine with 31-28 bits of PC+4 to get 32bit jump addr

Additional MUX uses Jump control to select instruction address

0: incremented PC (or) Branch target

1: Jump address 3. What is pipelining? Explain its stages with an example.

Pipelining is defined as the implementation technique in which more than one instruction is overlapped for simultaneous execution

Purpose:-

It is used to make the processors fast

It is divided into stages; each stage finishes a part of execution in parallel

All stages are connected one to next one to form a pipe Advantage:-

It increases instruction throughput. Example:- Consider the following instructions:- LW R1,100(R0) LW R2, 200(R0) LW R3, 300(R0) Without pipeline:-



With pipeline:- Stages of pipeline:-





WB : Write back the results in a register Graphical representation:- 4. Explain the types of hazards with examples.

Structural hazard

Data hazard

Control hazard Structural hazard:-

It occurs when two instructions use same resource at the same time

Here, 1st instruction is accessing data from memory, 4th instruction is fetching an instruction from that same memory at the same time.

Time CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 Instr1 IF ID Ex MEM WB

IF ID Ex MEM WB

IF ID Ex MEM WB

IF ID Ex MEM WB

Data hazard:-

It occurs when data are not available at the expected

time in the pipelined execution.

Consider two instructions: I1 occurs before I2

RAW

Read After Write

I2 reads before I1 writes it

So, I2 gets incorrect value

WAW

Write After Write

I2 writes before I1 writes it

I1 modifies the value, so I2 gets incorrect value

WAR

Write After read

I2 writes before I1 reads it

So I1 incorrectly reads the read value

Handling data hazard:- (solutions) Forwarding Bypassing Control hazards:-

It is also called branch hazard

It occurs when branching decision made before branch Handling control hazard:- (solutions) Stall the pipeline:- Predict branch not taken:-



Predict branch taken:- Delayed branch:- 5. Explain the pipelined data path and its control. Stages of pipeline:-





WB : Write back the results in a register For Load instruction:- 1. Instruction Fetch (IF)

Read instr’n from memory using address in PC

Place the fetched instr’n in IF/ID pipelined register

Increment the PC contents by 4 (PC PC + 4) 2. Instruction Decode (ID)

IF/ID pipeline registers supply two registers to be read

Read data from those two registers

Store them in ID/EX pipeline register 3. Execute instruction (EX)

Read reg1 contents from ID/EX pipeline register

Add them using ALU

Place the sum in EX/MEM pipeline register 4. Memory access (MEM)

Read data memory using address from EX/MEM pipeline

Load data into MEM/WB pipeline register 5. Write Back (WB)

Read data from MEM/WB pipeline register

Write it into register file

Instruction fetch:-

Instruction decode:-



Instruction Execution:- Memory access:-

Write Back:- For Store instruction:- 1. Instruction Fetch (IF)

Read instr’n from memory using address in PC

Place the fetched instr’n in IF/ID pipelined register

Increment the PC contents by 4 (PC PC + 4) 2. Instruction Decode (ID)

IF/ID pipeline registers supply two registers to be read

Read data from those two registers

Store them in ID/EX pipeline register 3. Execute instruction (EX)

Read reg2 contents from ID/EX pipeline register

Add them using ALU

Place the sum in EX/MEM pipeline register 4. Memory access (MEM)

Read data memory

store data into ID/EX pipeline register 5. Write Back (WB)

No process is done (Diagrams: Same as above for load instructions)



UNIT 4 – PARALLELISM

PART – A 1.. Distinguish between strong scaling and weak scaling.

Strong scaling Weak scaling

Strong scaling means, at a constant problem size, the parallel speed up increases linearly with the number of processors used.

Weak scaling means, the time to solve a problem with increasing size can be held constant by enlarging the number of processors used.

It is limited by Amdahl’s law.

It is limited by memory.

2.. Distinguish between UMA and NUMA

UMA NUMA

It is a type of shared memory architecture.

It is a type of shared memory architecture.

All the processors are identical, connected to a network, have equal access to all memory regions.

All the processors are identical, connected to a network, have individual memory units attached to it.

They are also called as Symmetric Multi-Processor machines (SMP).

They are also called as Asymmetric Multi-Processor machines (AMP)

. . . . . . . . . . .

. . . . . . . . . .

3.. What is Flynn’s classification?

Flynn has classified parallel computer architectures based on number of concurrent instructions and data streams

They are: SISD, SIMD, MISD, MIMD

Name Full form No.of processor

No.of instruction

No.of data stream

SISD Single Instruction Single Data

1 1 1

SIMD Single Instruction

Multiple Data

N 1 N

MISD Multiple Instruction Single Data

N N 1

MIMD Multiple Instruction

Multiple Data

N N N

4.. Define Multi-threading.

The ability of a CPU or a processor to execute multiple processes or threads concurrently is called as Multi-threading

It allows multiple threads to share the functional units of a single processor in overlapped fashion.

5. Define parallelism. What are its goals? Mention its types.

Parallelism is defined as the process of doing multiple operations at the same time.

Goals:- o Speed up the processing, increase the speed. o Increase the throughput o Improve the performance

Types:- o Instruction level parallelism o Task parallelism o Bit-level parallelism

6. What is ILP? What are the approaches to exploit ILP?

The technique which is used to overlap the execution of instructions and improve performance is called as Instruction-Level-Parallelism

Approaches:- o Dynamic hardware intensive approach o Static compiler intensive approach

7. Define Loop level Parallelism

The common way to increase the amount of parallelism available among instructions is, to exploit parallelism among iterations of a loop. This is called as Loop-level parallelism.

8.. What are the types of dependencies?

Data dependence

Name dependence

Control dependence

9. What are the types of Data hazard?

RAW (Read After Write)

WAW (Write after Write)

WAR (Write after Read) 10.. Define IPC

Inter process communication is defined as a set of programming interfaces that allow a programmer to coordinate activities among different program processes that can run concurrently in an operating system.

It allows the program to handle many user requests concurrently.

11.. Mention the three ways to implement Hardware MT

Coarse-grained Multi-threading

Fine-grained multi-threading

Simultaneous Multi-threading (SMT) 12. Mention the advantages of multi-threading

To tolerate latency of memory operations, dependent instructions, etc.

To improve system throughput by exploiting TLP

To reduce context switch penalty 13. What are multi-core processors? Give its applications.

A multi-core processor is a single computing component that contains two or more distinct cores in the same package.

Applications: General purpose, embedded systems, Networks, Digital Signal Processing, Graphics.

Bus interconnection n/w

M M M

P P

Bus interconnection n/w

M M

P P

M

P

U

N

I

T

4



.

.

.

.

.

.

.

.

PART – B 1. Explain the Flynn’s classification of parallel computer architectures with neat diagrams.

Flynn proposed a concept for describing a machine’s structure based on stream.

Stream means a sequence of items

There are two types of streams:- o Data stream (Sequence of Data) o Instruction stream (Sequence of instructions)

Flynn classified parallel computing architectures based on number of concurrent instructions and data streams

They are: SISD, SIMD, MISD, MIMD Name Full form No.of

processor No.of

instruction No.of data

stream

SISD Single Instruction Single Data

1 1 1

SIMD Single Instruction

Multiple Data

N 1 N

MISD Multiple Instruction Single Data

N N 1

MIMD Multiple Instruction

Multiple Data

N N N

SISD:-

Single Instruction Single Data

Each processor executes SINGLE instruction on a SINGLE data stream

Ex: IBM 704, VAX, CRAY – I Instruction stream

Instruction Data stream Stream

Advantages Disadvantages

Simple and easy to implement.

Low performance achieved

Less penalty will be levied. Low throughput is yielded

Less overhead will occur. Low level of parallelism exploited

SIMD:-

Single Instruction, Multiple Data

SINGLE instruction is executed on MULTIPLE data streams by multiple processors is called as SIMD

Ex: ILLIAC – IV, MPP, CM-2, STARAN

MISD:-

Multiple Instruction, Multiple Data

MULTIPLE INSTRUCTIONS are executed on SINGLE DATA streams by multiple processors

Ex: Pipelined Architecture

Advantages Disadvantages Better throughput than SISD High complexity

Less penalty than SISD High bandwidth required Better performance than SISD Low level of parallelism exploited

MIMD:-

Multiple Instruction, Multiple Data

MULTIPLE INSTRUCTIONS are executed on MULTIPLE DATA streams on multiple processors (CPUs)

Instruction stream Data stream

Instruction stream Data stream

Advantages Disadvantages Better throughput than MISD High complexity

Less penalty than MIMD Difficult to deploy and repair High level of parallelism exploited

Difficult to learn

2. What is multi-threading? Explain hardware multi-threading and its classification with illustrations Definition:-

The ability of a CPU or a processor to execute multiple processes or threads concurrently is called as Multi-threading

It allows multiple threads to share the functional units of a single processor in overlapped fashion.

Use:-

To increase the usage of existing hardware resources.

Memory Control Unit CPU

I/O

Control

Unit

CPU1

CPU2

CPU-N

Memory1

Memory2

Memory-n

.

.

.

.

CPU1

CPU2

CPU-N

Control Unit

Control Unit

Control Unit

Memory

Memory-1 Control Unit CPU

1

Memory-n Control Unit CPU

N

B U S INTERCONNECT IO

N



Purpose:-

To tolerate latency of memory operations, dependent instructions, etc.

To improve system throughput by exploiting TLP

To reduce context switch penalty Three ways to implement HMT:-

Coarse-grained Multi-threading

Fine-grained multi-threading

Simultaneous Multi-threading (SMT) i) Coarse-grained multi-threading:-

When a thread is stalled due to some event, switch to a different hardware context. This is called as coarse-grained multi-threading

It is also called as switch-on-event multi-threading

Advantages:- o It eliminates the need to have very fast

thread-switching o It does not slow down the thread because

the instructions from other threads issued only when the thread faces a costly stall.

Disadvantages:- o Since the CPU issues instructions from one

thread, when a stall occurs, the pipeline must be emptied or frozen

o New thread must fill pipeline before instructions can complete

ii)Fine-grained multi-threading:-

Switch to another thread in every cycle, such that no two instructions from the thread are in pipeline concurrently

It improves the usage of pipeline by taking advantage of multiple threads

Advantages:- o No need to check dependency between

instructions because only one instruction is in pipeline from a single thread.

o No need for branch prediction logic o Bubble cycles are used for executing useful

instructions from different threads o Improved system throughput, latency,

tolerance, usage.

Disadvantages:- o Extra hardware complexity is created,

because many hardware contexts are there, many thread selection logic is there

o A single thread performance is reduced o Resource Conflicts are created between the

threads iii)Simultaneous Multi-threading(SMT):-

Intel introduced SMT in 2002 (Intel Pentium IV – 3.06GHz)

It uses resources of a dynamically scheduled processors to exploit ILP

At the same time, it exploits ILP, It converts TLP into ILP

It also exploits following features from latest processors: o Multiple functional units: latest processors

have more functional units for a single thread o Register renaming and dynamic

scheduling: multiple instructions from independent threads can co-exist and co-execute

Advantages:-

More threads execute concurrently

Best processor utilization is done.

High performance is achieved Disadvantages:-

Highly complex task for software developers to develop the software to implement SMT on the given hardware

Security problem is also there. Intel’s hyper-threading technology has a drawback. On a system with many concurrent processes, from one process, one can steal the login details which is running in another process.

Illustration:-

Superscalar Course-grained Fine grained SMT

Thread 1

Thread 2

Thread 3

idle

3. What is ILP? Explain the methods to enhance the performance of ILP. Definition:-

The technique which is used to overlap the execution of instructions and improve performance is called as Instruction-Level-Parallelism

Principle:-

There are many instructions in code that don’t depend on each other so it’s possible to execute those instructions in parallel.

Build compilers to analyse the code

Build hardware to be even smarter than that code Approaches:-

Dynamic and hardware intensive approach:- o It depends on hardware to exploit the

parallelism dynamically at run time. o It is used in desktop, server and in wide range

of processors. o Ex: Pentium III and IV, Athlon, MIPS

R10000/12000, Sun UltraSPARC III, PowerPC 603, Alpha 21264

Static and compiler intensive approach:- o It depends on software technology to find

parallelism statically at compile time. o It is used in embedded systems o Ex: Intel IA-64 architecture, Intel Itanium



Methods to enhance performance of ILP:- i) LLP ii) Vector instructions

Loop-level parallelism:-

The common way to increase the amount of parallelism available among instructions is, to exploit parallelism among iterations of a loop. This is called as Loop-level parallelism.

Ex:- for ( i = 1; i < = 1000 ; i = i + 1 ) { x[i] = x[i] + y[i]; }

Every iteration of the loop overlap with any other iteration.

Within each loop iteration, there is less chance for overlap.

LLP means, parallelism existing within a loop.

This parallelism can cross loop iterations Techniques to convert LLP to ILP:-

Loop unrolling: converting the loop level parallelism into instruction level parallelism.

Either compiler or the hardware is able to exploit the parallelism inherent in the loop

Ex:- for ( i = 1; i < = 1000 ; i = i + 4 ) { x[i] = x[i] + y[i]; x[ i + 1 ] = x[ i + 1 ] + y[ i + 1 ]; x[ i + 2 ] = x[ i + 2 ] + y[ i + 2 ] ; x[ i + 3 ] = x[ i + 3 ] + y[ i + 3 ]; }

This technique works by unrolling the loop statically by the compiler or dynamically by the hardware.

Vector Instructions:-

A vector instruction operates on a sequence of data items

This sequence executes in four instructions:- o Two instructions for load the vectors X and Y

from the memory o One instruction to add the vectors o One instruction to store the result vector

Processors that exploit ILP have replaced the vector-based processors

But still the vector based processors are used in graphics, digital signal processing, multimedia applications.

4. What are multicore processors? Explain their mechanisms and applications in detail

A multi-core processor is a single computing component that contains two or more distinct cores in the same package.

Multiple cores can run multiple instructions at the same time, and it increases the overall speed.

It implements multiprocessing in a single physical component. Previous technologies:- two threads thread

Types of multicore processors:-

Two cores:-

Dual-core CPUs

Ex: AMD Phenom II X2, Intel Core Duo

Four cores:- Quad core CPUs Ex: AMD Phenom II X4, Intel i5 and i7

Six cores:- Hexa-core CPUs Ex: AMD Phenom II X6, Intel i7 extreme

Eight cores:- Octa-core CPUs Ex: Intel Xeon, AMD FX-8350

Ten cores:- Deca-core CPUs Ex: Intel, Xeon E7-2850

Applications:-

General purpose

Embedded systems

Networks

Digital Signal Processing

Graphics. Fundamental Theorem:-

These type of processors take advantage of relationship between power and frequency.

Each core is able to run at lower frequency

When the power is given to a single core, it is divided among each cores.

Therefore the performance is increased.

This technique is used for designing dual core, quad core, hexa core, octa core CPUs.

The power consumed is less.

To achieve this, expensive research techniques and equipment are needed, so big MNCs like Intel can do it.

Continuous advances in silicon process technology from 65nm to 45nm to increase transistor density. Intel delivers superior energy efficient performance transistors.

Enhancing the performance of each core with the help of advanced micro architectures every two years

Improve the memory system and data access among the cores. This decreases the latency and increases the speed and efficiency.

Optimizing the interconnect fabric that connects the cores to improve performance.

Optimizing and expanding instruction set to enhance the capabilities. If this is done, the industries can use this Intel processors for producing advanced applications with high performance, low power.

Heterogeneous Multi-core processors:-

Chip

Core

Chip

Core

Early processors: 1 chip, 1 core, 1 thread

Hyper-threading processors: 1 chip, 1 core, 2 thread

Core Core Core

Core Core Core Core



Advantages:-

Massive parallelism is achieved

Special type of hardware available for different tasks Disadvantages:-

Developer productivity: training is needed to use software.

Portability: software written for one GPUs will not run on other CPUs or other CPUs

Manageability: multiple GPUs and CPUs in a grid wants balanced work load.

5. Explain the types of dependences with examples. Types:-

Data Dependence

Name dependence o Anti-dependence o Output dependence

Control dependence Data dependence:-

It is also called as true data dependences

An instruction ’j’ is data dependent on instruction ‘I’ if any one of these conditions is true:-

Condition 1: instruction ‘I’ produces a result that may be used by instruction ‘j’. instruction j instruction i

Condition 2: instruction ‘j’ is data dependent on instruction ‘k’, instruction ‘k’ is data dependent on instruction ‘I’ Instruction j instruction k instruction i Ex:-

Loop: LOAD D F0, 0(R1) F0array

ADD D F4, F0, F2 Add scalar in F2

STORE D F4, 0(R1) Store result

DADDUI R1; R1,#8 Increment the pointer by 8bytes

BNE R1, R2, LOOP

Branch if R1!=R2

Here, the dependency is there in between all the instructions

It is shown by the arrows

This order should not be changed

If any order is changed, then it will create hazard in pipeline. Importance of Data hazard:-

It tells hazard will occur or not

It tells the order of the instructions for execution

It sets a limit for how much parallelism could be exploited Overcome data dependence:-

Maintain the dependence but avoid hazard

Eliminate the dependence by changing the code

Name dependence:- It occurs when two instructions use the same register or

memory location called “name”, but there is no data flow between instructions related to that “name”

It is not true data dependence because values are not transmitted between instructions

Types of name dependence:- Anti-dependence

Anti-dependence between instruction ‘I’ and instruction ‘j’ occurs when instruction ‘j’ writes a register or memory location that instruction ‘I’ reads.

The original order must be preserved to make sure that ‘I’ reads the correct value

Output dependence:-

Output dependence occurs when instruction ‘I’ and instruction ‘j’ write the same register or memory location.

The original order should be preserved to make sure that the final value is written to instruction ‘j’

Register renaming:-

Name dependence is not a true dependence , therefore they can be executed simultaneously or, be reordered if the name in two instructions doesn’t conflict

The renaming can be done for register operands, where it is called as register renaming.

They can be done statically by compiler or dynamically by h/w

Control dependence:- A control dependence finds the correct order of an instruction

‘I’, with respect to a branch instruction

So that the instruction ‘I’ is executed in correct program order

For every instruction, control dependence is preserved.

Ex:- If P1 { Statememt1; } If P2 { Statement2; } S1 is control dependent on P1 S2 is control dependent on P2 but not on P

Conditions:- 1. An instruction that is control dependent on a branch

cannot be moved before the branch because its execution WILL NOT BE CONTROLLED by branch Ex:- ELSE block cannot be executed before IF block

2. An instruction that is not control dependent on branch cannot be moved after the branch, because its execution is CONTROLLED by the branch Ex:- we cannot take a statement from IF block and send it to ELSE block

Preserving control dependence:-

Instructions executed in program order: It makes sure that the instruction that occurs before a branch is executed before the branch.

Find the control hazard or branch hazard: It makes sure that an instruction is control dependent on a branch is not executed until the branch direction is known.

If the processors follow program order, then the control dependence is automatically preserved.

Ignoring control dependence:-

It is not must to preserve control dependence

It can be violated if the instructions that should not be executed is executed.

If we want program correctness, Exception behaviour and data flow is needed



Preserving exception behaviour:-

Preserving exception behaviour means, any changes in ordering of the instruction must not change how exceptions are raised in the program

Ex:- DAADU R2, R3, R4 BEQZ R2, L1 LW R1, 0(R2)

L1: Problem: Moving LW before BEQZ

If Data dependence with R2 is not maintained, the result of the program can be changed.

If we ignore the control dependence and move LW before BEQZ, LW instruction will create memory protection exception.

Preserving data flow:-

Data flow is the flow of data among instructions that produce results

Branches make data flow dynamic, since they allow the source of data for a given instruction to cone from many points

Ex:- DADDU R1, R2, R3 BEQZ R4, L DSUBU R1, R5, R6

L: . . . OR R7, R1, R8

Value of R1 used by OR instruction depends on whether branch is taken or not

OR instruction is data dependent on DADDU, DSUBU

If branch is taken, value of R1 computed by DADDU is used by OR

If branch not taken, value of R1 computed by DSUBU is used by OR

Speculation:-

The violation of control dependence cannot affect exception behaviour or data flow is determined by the following:-

Ex:- DADDU R1, R2, R3 BEQZ R12, skip next DSUBU R4, R5, R6 DADDU R5, R4, R9

Skip next: OR R7, R8, R9 6. Explain the types of data hazards with examples. Hazard:-

A hazard is created whenever, o There is a dependence between instructions o They are close enough and overlap is created.

It would change order of access to the operand involved in the dependence.

Types:-

RAW (Read After Write)

WAW (Write After Write)

WAR (Write After Read)

Consider two instructions i and j with i occuring before j in program order.

RAW

J tried to read data before I writes it.

So J gets old value, instead of new value

This is the most common type of hazard

It is true data dependence

Program order should be preserved to make sure that j receives value from i

Ex:- I: ADD R1, R2, R3

J: ADD R4, R1, R5

WAW

J tried to write data before it is written by i

The writes are performed in wrong order

I comes first to write data, j comes second to write data

J only came last. So the value written by J is valid

But here, value written by I is taken, which is wrong

This is WAW hazard

It is occurs in pipelines that allow an instruction to proceed even if previous instruction is stalled.

Ex:- I: SUB R1, R4, R3

J: ADD R1, R2, R3

WAR:-

J tries to write data before it is read by I, so it gets the new value instead of old value.

This is anti dependence

It does not occur in static pipelines

It occurs when there are some instructions that write results early in instruction pipeline, and other instructions that read data lately in the pipeline or, when the instructions are reordered.

Ex:- I: ADD R4, R1, R5 J: SUB R5, R1, R2

RAR:-

Read After Read is not a hazard

Any number of read operations can be done.

Because it is not going to change any data 7. Explain the challenges in parallel processing.

Concurrency Reduce latency Hide latency Increase throughput

Data distribution

IPC Cost of communication Latency Vs Bandwidth Visibility of communications Synchronous Vs asynchronous communication Scope of communication Efficiency of communication

Load balancing Equal partition Dynamic assignment

Implementation and debugging



Concurrency:-

It is a property of a system representing the fact that more than one activity can be executed at the same time

Algorithm should be divided into group of operations

Then only performance is improved by parallelism

All the problems doesn’t have same amount of concurrency

Cleverness and experience of a programmer makes an algorithm to achieve maximal concurrency

Three ways for improving performance using concurrency:-

Reduce latency: Work is divided into small parts and and executed concurrently

Hide latency: Long running tasks are executed together concurrently

Increase throughput: If we execute multiple tasks concurrently, throughput of the system is increased

Data distribution:-

Distribution of a problem’s data is a challenge

Old type of parallel computers have data locality

It means, some data will be stored in memory that is closer to a particular processor and accessed quickly

Data locality occurs due to each processor having its own local memory

Because of data locality, a parallel programmer must concentrate on where the data is stored with respect to the processors

If more local values are there, the processor will access them quickly and complete the work

Distributing data and distributing data is tightly coupled

If we want optimal design, we have to concentrate on both of them

IPC:-

Inter process communication

It is a set of programming interfaces that allow a programmer to coordinate activities among different program processes that can run concurrently in OS

It allows the program to handle many user requests at the same time

These factors to be considered in IPC:-

Cost of comminucations: IPC virtually creates overhead. Usually, machine cycles and resources are used for

computation but here it is used to pack and transmit data

Latency Vs Bandwidth: Latency means, time taken to send a minimal

message from point A to B (expressed in micro seconds)

Bandwidth means, amount of data that can be sent per unit of time (expressed in MBPS or GBPS)

Sending many small messages creates latency and creates communication overhead

If we want to increase bandwidth, more small messages are packed into large message

Visibility of communications:- If we use message passing in IPC, direct

communication can be made It is visible and under control of programmer

If we use data parallel model, communications are transparent, particularly on distributed memory architectures

Programmers cannot know exactly, how the message passing IPC is working.

Synchronous and asynchronous communications:-

Synchronous Asynchronous

They communications are also called as blocking communications

They are also called as non-blocking communications

Because the work must wait until the communications are completed

Because the work can continue even if communications are not completed

Scope of communications:- It is hard to find the tasks that must communicate with

each other during the stage of a parallel code

Efficiency of communications:- Communication should be efficient, only the important

messages should be transmitted between the tasks Load balancing:-

Load balancing means, the practice of distributing approximately equal amounts of work among tasks, so that all tasks are busy always.

It is important to parallel programs for performance

It can be achieved by:-

Equal partition:- If a task receives a work, divide it equally For array/matrix operations, each task will do similar

work. So equally distribute data among the tasks For loop iterations, work done in every iteration is

similar, so distribute the iterations across tasks

Dynamic assignment:- Certain type of problems create load imbalance even

after data is distributed equally among the tasks:- Sparse arrays: some tasks will have actual

data to work, others have zeroes Adaptive Grid: some tasks need to refine their

mesh while others do not need to refine. N-body simulations: some particles may go

and come from the original task domain to another task domain

If the amount of work each task cannot be predicted, we can use scheduler-task approach. If a task finishes its work, it is added to queue to get new work

We need to design an algorithm that finds and handles load imbalances because they occur dynamically inside the code.

Implementation and debugging:-

Programmers need to design parallel algorithms by creating single task that executes on each processor

Program is designed to perform different calculations and communications based on processor’s ID.

It is called as Single Program Multiple Data (SPMD), its advantage is, only one program must be written

Another way is, Multiple Program Multiple Data (MPMD)

In SPMD and MPMD, executable must be created to cooperatively perform computation while managing data

If we want to implement such program, the knowledge of sequential programming is needed.



UNIT 5 – MEMORY & I/O SYSTEMS PART – A

1. Differentiate between volatile and non-volatile memory.

Volatile memory Non-volatile memory

Memory that loses its contents when the computer is switched OFF is called as Volatile memory

Memory that does not lose its contents when the computer is switched OFF is called as non-volatile memory

We need to refresh main memory content periodically

We need not to refresh main memory content periodically

Ex: RAM Ex: ROM

2. Differentiate between SRAM and DRAM.

SRAM DRAM Static Random Access Memory Dynamic Random Access

Memory

Information is stored in one bit cell, called as Flip Flop

Information is stored as charge across capacitor

Information is erased if power is switched OFF

Information is not erased if power is switched OFF

We need not to refresh memory periodically

We need to refresh memory periodically

Less packaging density High packaging density

More complex hardware Less complex hardware

More expensive Less expensive

3. Define locality of reference.

Instructions in localized area of the program are executed repeatedly during some period, and remaining of the program is not accessed frequently.

This is called as locality of reference

Reference is within the locality = Locality of reference

Ex: simple loops, nested loops 4. What are the types of Locality of reference?

Temporal Locality of Reference (locality in time) o Recently executed instructions are likely to be

executed again o Ex. Loops, Reuse

Spatial Locality of Reference (locality in space) o Instructions stored near to the recently

executed instructions are also likely to be executed again.

o Ex: straight line code, array access. 5. What are the techniques to improve cache performance?

Reducing the miss rate: Reduce the chances of two different memory blocks fight for same cache location

Reducing the miss penalty: Add additional level to the hierarchy called as multi-level caching.

6. What is the formula for calculating CPU execution time?

CPU execution

time

= (CPU clockCycles+memoryStall clock cycles) x clock cycle time

7. Define virtual memory.

Virtual memory is defined as a technique that is used to extend the size of the physical memory.

In virtual memory concept, Operating system moves the program and data between main memory and secondary memory.

It is also called as imaginary memory.

-Main memory acts as cache for secondary memory.

8. Define TLB. What is its purpose?

Translation Look-aside Buffer

It is a cache memory.

The page table in main memory is placed in the main memory, but a copy of its small portion is placed within Main memory Unit. This is called as TLB

It contains page table entries of most recently accessed pages and their virtual addresses.

It can contain 32 page table entries

TLB coupled with a 4KB page size, covers 128KB memory addresses

9. What is DMA?

Direct Memory Access

Transferring a large block of data directly between an external device and main memory is called as DMA

External device controls the data transfer

External device generates address and control signals to control data transfer.

This external device which controls the data transfer is called as DMA controller.

10. Define interrupts

The event that creates interruption is called as interrupt.

Special routine that is executed to service the interrupt is called as Interrupt service

It is an external event that affects the normal flow of execution.

It is caused by external hardware such as keyboard, mouse, printer, etc

11. What is exception? What are the types of exception?

An interrupt stops the currently executing program and starts another program.

This interrupt is created by external hardware.

Like this, many events can create interrupts.

All these type of events that stops current program and creates new program is called as exception.

Types: Faults, Traps, Aborts.

Faults: exceptions that are detected & serviced before execution of an instruction that creates problem

Traps: exceptions that are reported immediately after execution of instruction that creates some problem

Aborts: Exceptions that does not allow execution of an instruction that creates problem.

12. What are the features (or) functions of IOP?

IOP can fetch and execute its own instructions

Instructions are specially designed for I/O processing

8089 IOP can perform data transfer, arithmetic and logical operations, branches, searching, translation.

It also performs I/O transfer, device set up, programmed I/O, DMA operation.

It can transfer data from 8-bit source to 16-bit destination

It supports multiprocessing environment 13. Differentiate between programmed I/O and DMA

programmed I/O DMA

Software controlled data transfer

Hardware controlled data transfer

Data transfer speed is low Data transfer speed is high

CPU is involved in transfer CPU is not involved in it

No controller is needed DMA controller is needed During transfer, data goes through processor

During transfer, data does not go through processor



PART – B

1. Explain the various memory technologies in detail with neat diagrams if necessary.

There are five basic memory technologies that are in current trend. They are:-

RAM (SRAM, DRAM)

ROM (PROM, EPROM, EEPROM)

Flash memory (Flash cards, Flash Drives)

Magnetic Disc memory

Optical Disc memory(CD-R, CD-RW, DVD-R,DVD-RW) RAM:-

They are classified into SRAM and DRAM

They can store the data until the power is ON SRAM:-

SRAM means, Static Random Access Memory

They are built on MOS and Bipolar technology.

MOS- MOS SRAM cell, Bipolar-TTL RAM cell MOS SRAM cell:-

Enhancement mode MOSFET transistors are used.

T1 and T2 forms basic cross coupled inverters

T3 and T4 acts as load resistors for T1 and T2

X and Y lines are used for addressing the cell. When X and Y both are HIGH (1), Cell is selected

When X = 1, T5 and T6 are ON, cell is connected to data and data line

If Y = 1, then T7 and T8 are ON.

Because of this, either READ or WRITE is possible.

WRITE operation:-

Enable W = 1

If W = 1, and Din = 1, Node D is also 1.

This makes T2 ON, T1 OFF.

If next data of Din is 0, then T2 turns OFF, T1 turns ON

READ operation:-

Enable R = 1

If R = 1, T10 becomes ON.

This connects data output line to data out

This makes complement of the bit stored in the cell is available in output.

TTL RAM cell:-

TTL – Transistor-Transistor Logic

Bipolar Memory cell is implemented using TTL multiple emitter technology

It stores 1 bit of information (0 or 1)

It is just like a Flip-Flop

Information will be there until power is ON

X and Y select lines select a cell from matrix.

Q1 and Q2 are cross coupled inverters (one is OFF, other is ON always)

If Q1 is ON, Q2 is OFF, 1 is stored in the cell.

If Q1 is OFF, Q2 is ON, 0 is stored in the cell.

State of the cell is changed to “0” by applying “HIGH” to Q1 emitter

This makes Q1 off

If Q1 is OFF, then Q2 will be ON (one should be ON always)

As long as Q2 is ON, Q2 collector is LOW.

1 can be rewritten by applying “HIGH” to Q2 emitter DRAM:-

Like a capacitor stores charge in it, DRAM stores data in it

It contains 1000s of DRAM cells like the above diagram.

When column (SENSE) and row (CONTROL) lines are HIGH, MOSFET conducts charge to the capacitor

When SENSE and CONTROL lines are LOW, MOSFET opens and capacitor’s charge is locked.

By this way, it stores 1 bit.

Since only a single MOSFET and capacitor needed, DRAM contains more memory cells compared to SRAM

Information is not erased if power is switched OFF

We need to refresh the memory every millisecond.

It is less complex hardware and less expensive

2

3



Write operation:- __

To enable WRITE operation, R/W line is made LOW

This enables input buffer and disables output buffers To write “1” into the cell, Din = HIGH, transistor = ON, ROW line = HIGH

This allows capacitor to charge a positive voltage

When 0 is stored, LOW is applied to Din.

Capacitor remains uncharged.

If it stores “1”, capacitor is discharged.

When ROW line is made LOW, transistor turns OFF, disconnects capacitor from data line

Therefore storing the charge (0 or 1) on the capacitor.

Read operation:-

To read data from cell, R/W line is made HIGH

This enables output buffer, disables input buffer

Then, ROW line is made HIGH.

It turns capacitor ON, connects the capacitor to Dout line through output buffer

Refresh operation:-

To enable refresh operation, R/W line, ROW line,

REFRESH lines are made HIGH

This makes transistor ON, connects capacitor to COLUMN line

As R/W is high, output buffer is enabled

The stored data bit is applied to input of refresh buffer

Enables refresh buffer produces a voltage on COLUMN line, related to the stored bit

Therefore the capacitor is refreshed. SDRAM:-

DRAM whose operation is directly synchronized with a clock signal is called as SDRAM

Synchronous Dynamic Random Access Memory

In DRAM, processor sends addresses and control signals to the memory.

After some time delay, DRAM either reads or writes data

During this delay, DRAM performs various internal functions

The processor has to wait in this delay.

To avoid this problem, SDRAM is produced.

SDRAM exchanges data with processor synchronized to an external clock signal.

This makes processor to read and write data without delay

SDRAM latches the address sent by the processor and then responds after a number of clock cycles.

Meanwhile the processor can do other task.

Writing “1” into

DRAM cell

Writing “0” into

DRAM cell

Reading “1”

from DRAM cell

Refreshing “1”

in DRAM cell



Timing Diagram of burst data transfer of length 4

DDR SDRAM:-

Fastest version of SDRAM

DDR Double Data Rate

SDRAM performs operations on rising edge of the clock signal

But DDR SDRAM performs operations on both the edges of clock signals

The bandwidth is doubled in DDR

It is also called as faster SDRAM

Two banks of cell arrays are there in DDR SDRAM

It is dual bank architecture

Each bank can be accessed separately

Nowadays, DDR version II and III is released

ROM:-

They can store the data even after power is OFF

We cannot write data to it

Non-volatile memory

It is used to store binary codes

It contains only Diode and decoder

Address lines A0 and A1 are decoded by 2 : 4 decoder

PROM:-

Programmable ROM

It has diodes in every bit position

Output is initially all 0s

Each diode has fusible series link.

By addressing the bits and applying proper current pulse at output, we can blow out that fuse, store “1” at that bit

Fuse is made up of nichrome

For blowing, pass 20 – 50 mA current for 5 – 20 µs

This blowing occurs according to truth table of PROM

PROM programmers can do it programmatically

That is why the name is called as PROM

They are one-time programmable, once programmed, information cannot be erased.

EPROM:-

Erasable PROM

They use MOS circuit

They store 0s and 1s as a packet of charge in IC.

They also can be programmed by EPROM programmers

We can erase the data in it, by exposing the chip to UV light through quartz window for 15 – 20 mins

We cannot erase selective information, all information will be vanished.

It can be re-programmed and re-used many times EEPROM:-

It also uses MOS circuit

Data is stored as: CHARGE or NO CHARGE

20 – 25 V charge is used to move charges

We can selectively erase information

They are expensive than ROM Flash memory:-

They are RW memories (both READ and WRITE)

We can read contents of a single cell, but can write whole block of cells

It is based on single transistor controlled by trapped charge

They have higher capacity, less power consumption

It is suitable for Laptop, tablets, smartphones, iPod, etc.

Types: Flash card (memory card) 1 GB to 64 GB Flash drive(pen drive) maximum 64GB capacity



Magnetic Disk memory:-

It is a thin circular metal plate, coated with thin magnetic film

Digital information is stored on it, by magnetizing the magnetic surface

Magnetizing head will be attached to the spindle, on which the magnetic disc is spinning in its axis

It is usually connected to a computer using SCSI bus

Transfer speed in SCSI bus is much faster

Optical Disk:-

CD-ROM:- Compact Disk – ROM, max capacity 700MB Data is stored on a single side, other side is wrapped Data recording is done by focussing laser beam on

surface of spinning disk disc is divided by tracks and sectors

Merits – CD Demerits - CD Large capacity compared to ROM Read only, cannot be updated

Cheaper, light weight Access is slow compared to magnetic disk

Reliable, removable and efficient

It needs careful handling, easily get scratched

CD-RW:- o We can read and write data in that CD o Maximum capacity of 700MB o Light weight, reliable, removable, efficient o Lot of spaces wasted on outer tracks

DVD:- o Digital Versatile Disk o It is used for many purpose, that’s why DVD o We can store data on both the sides o Available in 4.7GB, 8.54GB, 9.4GB, 17.08GB o Large capacity than CD o We can store full movies, OS in one single DVD

2. Define Cache memory. Explain the types of Cache memories and cache updating policies. Cache:-

Every time the processor of a computer system has to fetch program and data from the main memory for its operations; But it is time consuming.

So a new kind of memory is introduced, to have a copy of frequently used data; can be accessed very fast because they are very small in size.

This is called as cache memory.

It is very smaller than RAM, placed between RAM and processor

Cache is made up or faster memory (SRAM)

Main memory (RAM) is made up of DRAM (slower)

If the processor requests a data in cache, which is not available in cache, it is called as Cache Miss

If the requested data is available in cache, it is called as Cache Hit

Data is stored in Blocks of memory.

Cache controller decides which memory block should be moved in / moved out of cache and main memory

Locality of reference is responsible for best usage of cache

Instructions in the localized area of the program are executed repeatedly during some period and remainder of the program is not accessed frequently. This is called as locality of reference. Ex: Simple Loops, Nested Loops

Temporal Locality:- (Temporal Time)

Recently executed instructions have more chances of being executed again (very soon).

It is also called as Locality in Time

Example: Loops, Reuse

Whenever the data is needed, it should be brought into cache.

Spatial Locality:- (Spatial Space)

Recently executed instructions that are stored nearby, have more chances of being executed again (very soon).

Ex: straight line code, array access, etc Whenever data is needed, that particular data alone will not

be placed into cache, whole memory block will be placed into cache.

Types:- Primary Cache:-

It is also called as processor cache (within the processor)

It is also called as L1 (or) Level1 cache Secondary Cache:-

It is also called as Level2 (or) L2 cache

It is placed between primary cache and main memory (RAM)

Merits – Cache Demerits – Cache Faster than main memory Very small in size

Quick access time Very expensive

Stores data quickly Difficult to design

Cache updating Policies:-

Cache stores some blocks at a time.

If cache size is small than all blocks in main memory, only the active segments of the program is placed in cache; execution time is reduced

Processor requests for a word, if it is not present, cache controller decides which block should be removed out of cache.

Read Hit Write Hit

Requested data by processor is available in cache

Cache memory has copies of data in main memory. Write-Through protocol:- Contents in cache and main memory are updated simultaneously to avoid confusion

That data is obtained from cache and sent to processor

Write-Back Protocol:- Updating the cache contents is called as Dirty/modified bit Main memory contents are updated when a block has to be removed from cache for inserting a new block

Processor Cache (SRAM)

Main Memory (DRAM)

Cache Controller



Read Miss Write Miss

Block of words that contains the requested word, is copied from main memory to cache

After entire block is loaded in cache, requested word is sent to processor. --If requested word doesn’t exists in cache during write operation, WRITE MISS will occur.

During READ, requested word is not in cache, there READ MISS will occur

If write-through protocol is used, data is directly written to main memory.

If write-Back protocol is used, blocks containing addressed word are first put in cache, then the required word in cache is overwritten with new data.

To overcome this, Load-Through/ Early restart protocol is used

3. Explain the techniques used to reduce the cache miss. (or)

Explain the methods of mapping functions and how are they useful in improving cache performance.

(or) Explain the mapping techniques in cache with neat diagram

Usually cache memory can store only limited number of blocks at a time. So, it can hold only a very small amount of blocks from main memory

This management of blocks between main memory and cache memory is called as mapping function.

There are two kinds of mapping techniques in cache organization:-

Before going into techniques, some assumptions are made:-

Cache consists of 128 blocks of 16 words each

Total cache size = 128 * 16 2048 (2K) words

1 page in main memory = group of 128 blocks of 16 words each

Main memory has 32 pages

128*16=2048 * 32 = 65536

Main memory has 65536 words Direct Mapping:-

Simplest mapping technique

Each block from main memory has one location in cache.

Block ‘I’ of main memory mapped to block i%128 of cache Main memory Blocks 0,128,256, …are stored in cache block 0

Main memory blocks 1, 129, 257, ..are stored in cache block 1

Here, address is divided into three fields: Tag,Block,Word

Word field: Select a word out of 16 words in cache

Block field: Contains 7 bits, because 128 blocks in cache(27=128)

Tag Field: Select a page among 32pages in main memory

Higher order 5 bits are compared with tag bits associated with that cache location

If they match, then required word is present in that cache block

If they does not match, the required block is not present in cache, so, read from main memory, load into cache

Merits – Direct Mapping Demerits – Direct Mapping

Easy to implement and understand

Processor needs to access same memory location from two different pages on main memory frequently

Less time consumed in implementing directly

But only one location will be present in cache at a time

Cache is directly mapped with main memory

Not flexible

Fully Associative Mapping:- Main block can be placed into any cache block position

Address contains only two fields: Word, Tag

Tag: To identify a memory block when it is in cache

Higher order 12 bits of an address received from CPU compared with tag bits of each cache block, to check whether required block is present or not.

If required block is present in cache, Word field is used to find required word from cache

We have freedom of choosing cache location for storing main memory block

If a new block enters cache, it has to remove the old blocks in cache only if the cache is full.

Here, for replacement of cache blocks, replacement algorithms are used (LRU, LFU, FIFO, Random).

Compare higher order bits of main memory address is compared with all 128 tags corresponding to each block, to check whether requested block is present in cache.

Merits – Direct Mapping Demerits – Direct Mapping

Place a main memory block anywhere in cache

Compare Tag bits with all 128 tags of cache for checking whether block is present in cache or not

Mapping

techniques

Direct

Mapping

Associative

Mapping

Fully

Associative

Set

Associative



Two-way set associative mapping:-

Set associative = direct mapping + associative mapping

Many groups of direct mapped blocks operate as many direct mapped caches in parallel

A block of data from any page in main memory can go into particular block of directly mapped cache

Required address comparison depends on number of direct mapped caches in cache system

These comparisons are always less than the comparisons in fully associative mapping

Size of 1 page in main memory = size of 1 directly mapped cache

It is called as two way set associative because, each block from main memory has two choices for placing block.

Main memory Blocks 0, 64, 128, … can map into any 1 of cache blocks of set 0

Main memory blocks 1, 65, 129, … can map into any 1 of cache blocks of set 1, and so on

Three fields are needed

Word field: select one of 16 words in a block

Set field: find requested block from set 0 to set 64

Tag field: 6 bits, because, 64 pages are there ( 26) Merits – set-associative

Two directly mapped caches available;

Only two comparisons required to check given block is present of not

Reduced hardware cost

improved cache hit ratio

4. Explain the organization of virtual memory and its address translation technique with neat diagrams.

In modern computers, main memory is not enough for all the operations required by a processor of a computer

So, virtual memory (VM) technique is used to extend the size of main memory (RAM)

It uses secondary storage such as disks, pendrives, etc.

Virtual means imaginary. An imaginary memory is created inside the operating system, so that the user will get a feel that, main memory is this much large.

For example, if a 32GB movie has to be displayed, main memory is not enough to store it, so that 32GB movie is divided into segments.

Now, the currently running segment of the movie is played in main memory, remaining are stored in secondary storage.

If next segment of movie is needed, it replaces the previous segment in main memory

OS is responsible for management of VM

Here, the addresses issued by processor called as virtual address / logical address

They are converted into physical address (real address).

Similarly, many applications can be run on a computer at the same time, such as MS word, VLC, games, etc

There is not enough space in main memory to contain all these applications

But, in all these applications, only a small part will be currently active; so it is enough to load that part alone into RAM

This concept is called as VM



Address Translation:-

address is broken into virtual page number and page offset

virtual page number is converted to physical page number

physical page number contains upper portion of physical address, page offset contains lower portion of address

number of bits in page offset decides the page size

page table is used to maintain information about main memory location of each page

The page is stored in which address of main memory, current status of page also stored in page table

To find address of corresponding entry in page table, Virtual page number + contents of page table base register

Page table base register has starting address of page table

The entry in page table gives physical page number

Add this physical page number + offset to get physical address in main memory

If required page is not present in main memory, PAGE FAULT occurs; that page is loaded from secondary storage to main memory by a program, PAGE FAULT ROUTINE

The technique of getting desired page in main memory is called as DEMAND PAGING

To support Demand Paging and VM, processor has to access page table in main memory

To avoid this access time, a part of page table is kept in main memory, called as TLB (Translation Lookaside Buffer)

Buffer means, a temporary storage place.

TLB stores part of page table entries (recently used pages)

Virtual address to physical address translation

Segment Translation:-

Every segment selector has a linear base address associated with it and stored in segment descriptor.

A Selector is used to point the descriptor for segment in a table of descriptions

Linear base address from descriptor is then added to the 32-bit offset to generate 32bit linear address

This process is called as SEGMENTATION or SEGMENT TRANSLATION

If paging unit is not enabled, then the 32bit linear address corresponds to the physical address.

If paging unit is enabled, paging mechanism translated linear address space into physical address space by paging process. Segment translation = Convert Logical address to Linear address

Page Translation:-

It is the second phase of address translation Segment translation translates logical address to linear address

Page translation converts that linear address to physical address When paging is enabled, the paging unit arranges the

physical address space into 1,048,496 pages of each 4096(4KB) long



5. Explain the purpose and working of TLB with a diagram.

It is also called as Page Translation Cache

If processor refers two directories Page directory and page table, performance will be reduced. To solve this problem, the processor stores the most recently used page table entries in ON-CHIP cache. This is called as TLB

It can hold up to 32 page table entries.

32 page table entry coupled with 4K page size, results in coverage of 128K bytes of memory addresses

Page table is placed in main memory, but a copy of small portion of page table is placed in processor chip. This memory is called as TLB

Based on virtual address, MMU (Memory Management Unit) searches TLB for required page

If page table entry for that page is found in TLB, Physical address can be obtained immediately

IF entry not found, there is a Miss in TLB, The required entry is got from page table in RAM, and then stored in TLB

If OS makes any changes to any entry in the page table, control bit in TLB will invalidate that entry in TLB

When a program generates an access request to a page, that is not in main memory, page fault will occur.

That page should be brought from secondary storage (disk)

When it detects a page fault, MMU asks the OS to generate interrupt

OS will suspend the execution of the task which has created page fault, and starts execution of another task, whose pages are ready in main memory.

When the suspended task resumes, that instruction must be continued from the point of interruption, Or, that instruction must be restarted.

If a new page is brought from Disk, and main memory is full now, then that new page should replace a page from the main memory according to LRU algorithm (Least Recently Used)

Modified page is written to disk before removed from main memory.

Write through protocol is used for this task.

6. Explain the Programmed I/O data transfer technique.

I/O operation means, a data transfer between an I/O device

and memory (or) between an I/O device and the processor

In a computer system, if all the I/O operations are controlled by processor, then that system is using PROGRAMMED I/O

If that technique is used, processor executes programs that start, run and end the I/O operations including sensing device status, sending a R/W command or transferring data

Processor periodically checks status of I/O system until the operation is completed

Example:-

Processor’s software checks each of I/O devices regularly

During the check, microprocessor sees whether any device needs any service or not.

The following diagram services I/O ports A, B and C

The routine (program) checks the status of I/O ports

It first transfers status of I/O port A into accumulator

Then the routine checks contents of accumulator to check service request bit is SET or RESET

If SET, I/O port A service routine is called

After completing, it moves on to port B

The process is repeated again

It continues till all the I/O ports status registers are tested and all I/O ports are serviced.

Once this is done, processor continues to execute normal programs

When programmed I/O is used, processor fetches I/O related instructions from memory and gives the necessary I/O commands to I/O system for execution

Technique used for I/O addressing is followed (memory mapped I/O or I/O mapped I/O)

When processor sees an I/O instruction, the addressed I/O port is expected to be ready to respond, to avoid info loss.

Thus, a processor should know I/O device status always.

In Programmed I/O systems, processor is usually programmed to test the I/O device status before data transfer.



7. What is DMA? Explain DMA cycles and configuration with neat diagrams.

It comes under hardware controlled data transfer

An external device is used to control the data transfer

External device generated the address and control signals to control the data transfer

It allows the peripheral device to directly access the memory

This technique is called as DIRECT MEMORY ACCESS

That external device that controls the data transfer is called as DMA CONTROLLER

DMA Idle Cycle:- When the system is turned ON, the switches are in ‘A’ position

The buses are connected from processor to system memory and peripherals

Processor executes the program until it needs to read a block of data from disk

To do this, processor sends series of commands to disk controller, telling it to search and read desired block of data

When disk controller is ready to transfer first byte of data from disk, it sends DMA request (DRQ), which is a signal to DMA controller.

Then DMA controller sends a hold request (HRQ), which is a signal to the processor to HOLD input

The processor responds to this HOLD signal by sending acknowledgement (HLDA) to DMA controller.

When DMA controller receives HLDA signal, it sends control signal to change switch position from A to B

This disconnects the processor from the buses and connects DMA controller to the buses

DMA Active Cycle:-

When DMA controller gets control of the buses, it sends memory address where first byte of data from the disk is to be written

It also sends DMA Acknowledge, DACK signal to disk controller device, telling it to get ready to send the byte

Finally it asserts IOR and MEMW signals on control bus

IOR (I/O Ready) signal enables disk controller to send byte of data from disk on data bus

MEMW (Memory Write) signal enables addresses memory to accept data from data bus

CPU is involved only at the beginning and at the end of data transfer operations.

Data transfer is monitored by DMA controller, which is also called as DMA channel

When CPU wants to read or write a block of data, it issues a command to the DMA module, with these instructions:- R/W operation Address of I/O device involved in this operation Starting address in memory to read or write No. of words to be Read/Written

DMA channel:-

It consists of Data counter, data register, address register, control logic

Data counter stores number of data transfers to be done in one DMA cycle

It is decremented automatically after each word transfer

Data register acts as a buffer

Address register stores starting address of device

When data counter is ZERO, DMA transfer is stopped

DMA controller sends an interrupt to processor saying that the DMA operation is finished.

Diagram (a) shows that the CPU, DMA module, I/O system and memory share the same system bus

Here Programmed I/O is used

Data transferred between memory and I/O system through DMA module

Each transfer of a word consumes two bus cycles

Diagram (b) shows that there is a different path between DMA module and IO system

This is another DMA configuration

Diagram I is third type of DMA configuration

Here I/O are connected to module using I/O bus

This reduces number of I/O interfaces in DMA module



8. Explain the different data transfer modes in DMA. DMA controller transfers data in any one of the following modes: Single Transfer Mode (Cycle Stealing) Block Transfer Mode Demand (or) Burst Transfer Mode Single Transfer Mode:-

In this mode, device can make only one transfer (byte). After each transfer, DMAC gives control of all buses to

processor Series of operations

I/O device asserts DRQ line when it is ready to transfer data

DMAC asserts HLDA line to request use of the buses from processor

Processor asserts HLDA, granting bus control to DMAC

DMAC asserts DACK to request I/O device, executes DMA bus cycle and data transfer

I/O device deasserts its DRQ after data transfer of 1 Byte

DMA deasserts DACK line

Byte transfer count is decremented, memory address is incremented

HOLD line deasserted to give back control of all buses to processor

HOLD signal reasserted to request use of buses when I/O device ready to transfer another byte; same process repeated until last transfer

When data transfer count is ZERO, transfer is finished. Block Transfer Mode:-

Here, device can make number of transfers as programmed in the word count register.

After each transfer of word, count is decremented by 1, address is incremented by 1

DMA transfer is continued until word count becomes ZERO

It is used when DMAC needs to transfer a block of data Series of operations

I/O device asserts DRQ line when it is ready to transfer data

DMAC asserts HLDA line to request use of the buses from processor

Processor asserts HLDA, granting bus control to DMAC

DMAC asserts DACK to request I/O device, executes DMA bus cycle and data transfer

I/O device deasserts its DRQ after data transfer of 1 Byte

DMA deasserts DACK line Transfer count is decremented, memory address is incremented

When transfer count = ZERO, data transfer is not complete, DMAC waits for another DMA request from I/O

If transfer count is not ZERO, data transfer is finished, DMAC deasserts HOLD to tell processor , it does not need buses hereafter

Processor then deasserts HLDA signal to tell DMAC that it has got back control of the buses

Demand Transfer Mode:-

Here the device is programmed to continue data transfer until TC (Terminal Count) or EOP (End of Process) signal is encountered, or until DREQ (DMA Request) is inactive

Series of operations 1. I/O device asserts DRQ line when it is ready to transfer data

2. DMAC asserts HLDA line to request use of the buses from processor

3. Processor asserts HLDA, granting bus control to DMAC 4. DMAC asserts DACK to request I/O device, executes

DMA bus cycle and data transfer 5. I/O device deasserts its DRQ after data transfer of 1 Byte 6. DMA deasserts DACK line 7. Byte transfer count is decremented, memory address is

incremented 8. DMAC continues to execute data transfer until TC or EOP

is encountered 9. I/O device can restart DMA request by sending DRQ

(DMA Request) signal once again 10. Data transfer continues until transfer count = ZERO

Single Transfer Mode:-

Block Transfer Mode:-



Demand Transfer Mode:-

9. Explain in detail the Bus Arbitration techniques in DMA

The device that is allowed to initiate data transfer on bus at any given time is called as BUS MASTER

In a computer system, there may be more than one bus master such as processor, DMA controller, etc

They share system bus

When current Bus Master gives back bus control, another bus master gets the bus control.

Bus arbitration is defined as the process by which the next device to become bus master is selected and bus mastership is transferred to it.

Selection is done on priority basis

There are two types of bus arbitration techniques in DMA;-

Centralized arbitration technique Distributed arbitration technique

Centralized arbitration Technique:-

A single bus arbiter performs the arbitration

The bus arbiter may be processor or a separate controller

Three types of centralized arbitration. They are:- Daisy chaining Polling Method Independent Request

Daisy Chaining:-

It is a simpler and easy method

All masters make use of same line for bus request

In response to a bus request, controller sends a BUS GRANT signal if bus is free

BUS GRANT signal serially propagates through each master until it encounters first one that is requesting access to bus

This master blocks the propagation of the BUS GRANT signal, activates BUSY LINE signal and gains control of bus

Any other requesting module will not receive grant signal and cannot get bus access

Adv-Daisy Chaining Disadv-Daisy Chaining

Simpler and cheaper method Propagation delay of bus grant signal is proportional to number of masters in the system. This makes arbitration time slow, therefore limited number of masters are allowed in a system

It requires the least number of lines and independent of number of masters in the system

Priority of master is fixed by its physical location

Failure of one master makes whole system failure

Polling Method:-

Controller is used to generate addresses for masters

Number of address lines required depends on number of masters connected in the system

If there are 8 masters in the system, at least three address lines needed

If anybody sends BUS request, controller generates a sequence of master addresses

When requesting, master finds its address, activates BUSY line signal.

Adv-Polling method

Priority can be changed by changing polling sequence in the controller

If one module fails, entire system does not fail

More improved than the daisy chaining method.



Independent priority method:-

Each master has a separate pair of bus request and BUS GRANT lines and each pair has a priority assigned to it

The built-in priority decoder within the controller selects highest priority request and asserts corresponding BUS GRANT signal.

Adv-Independent priority

Due to the separate pair of bus request and bus grant signals, arbitration is fast Arbitration is independent of number of masters in the system

Disadv-Independent Priority

It requires more bus request and bus grant signals

Distributed arbitration:-

All devices participate in selection of next bus master

Each device on bus is assigned a 4bit ID

The number of bits in ID depends on number of devices

When one or more devices request for bus control, they assert START-ARBITRATION signal and place their 4bit ID on arbitration lines, ARB0 to ARB3

More than one device can place their 4bit ID to indicate that they need control of bus

If one device puts 1 on bus line, another device puts 0 on same bus line, bus line status will be 0

Device reads status of all lines through inverter buffers, so device reads bus status 0 as logic 1

Device having highest ID, has highest priority

When two or more devices place their ID on bus lines, it is necessary to find highest ID from status of bus line

For example, consider two devices A and B having ID 1 and 6, request for bus

Device A puts bit pattern 0001, device B puts 0110

With this combination, the bus line status will be 1000

Inverter buffers code seen by both devices is 0111

Each device compares code formed on arbitration lines to its own ID, starting from MSB

If it finds a difference at any bit position, it disables drives at that position by placing 0 at input of all these drives

Here, device detects a difference on line ARB2

It disables drives on lines ARB2, ARB1, ARB0

This makes code on arbitration lines to change to 0110

0110 6, which is ID of B

This means, B wins the competition

Adv:-

It offers high reliability because operation of bus is not dependent on any single device.

10.What are Interrupts? Explain the Interrupt hardware in detail with necessary diagrams. Interrupts:-

An External event that affects the normal flow of instruction execution generated by the external hardware devices such as keyboard, mouse, etc is called as interrupts

Ex: computer should response to keyboard, mouse, etc when they ask something.

If a device wants to tell processor about the completion of an operation, it sends a hardware signal,, that signal is called as Interrupt

A special Routine that is executed to give service to the interrupts is called as Interrupt Service Routine (ISR)

Interrupt request line is used to alert the processor

A program can be interrupted in three ways:- By external signal By a special instruction in the program By some other condition

Ex:- Main program Instruction1:______ ; INTERRUPT OCCURS HERE ; Instruction n:_______

ISR . . . .



An interrupt is caused by an external signal is called as hardware interrupt

Conditional interrupts (or) interrupts is created by special instructions are called as software interrupts

Interrupt Hardware:-

An I/O device requests an interrupt by activating a bus line called as interrupt request (or) request

Interrupts are classified as Single level and Multilevel interrupt Single level Interrupts:-

There can be many interrupting devices. But all interrupt

requests are made via a single input pin of CPU

When interrupted, CPU has to poll the I/O ports to identify requested device

Polling software routine that checks the state of each device.

Once the interrupting I/O port is found, CPU will service it and then return to task it was performing before the interrupt

Interrupt request from all devices are logically ORed and connected tointerrupt input of processor

The interrupt request from any device is routed to processor interrupt input

After getting interrupted, processor identifies requesting device by reading interrupt status of each device

All devices connected to INTR line via ground switches

To request an interrupt, device closes its associated switch

Interrupt signals I0 ….. In, interrupt request line will be equal to VDD

When a device requests for reply, voltage line drops to zero

If a signal is closed, Flag 0 is used; INTR=1

Open collector and open drain gates are used

Because o/p of open collector (or) an open drain gate is equivalent to a switch to ground that is open\

Multi-Level interrupts:-

Processor has more than one interrupt pins

I/O devices are tied to individual interrupt pins

They can be immediately classified by CPU upon receiving an interrupt request from it

This allows processor to go directly to that I/O device and service it without polling concept

This saves time in time processing input

When a process interrupted, it stops executing its current program, and calls special routine

The event causes interruption is called as interrupt

Processor finishes its current instruction; no cut-off

Program counter’s current details stored in stack

Remember during pgrm execution of an instruction PC is loaded with address of ISR

Interrupt programs continue working until result executed



Enabling and Disabling interrupts:-

Maskable interrupts are enabled and disabled under program control

By SET and RESET particular flip flops in processor, interrupts can be masked or unmasked

When masked, processor does not respond to interrupt even though interrupt is activated

Most of the processors give masking facility

In some kinds of processors, those inputs which can be masked under software control are called as maskable interrupts

The interrupts that cannot be masked under software control are called as non-maskable interrupts

Exceptions:-

An interrupt is an event that suspends processing of currently executing program and begins execution of another program

Many events can cause interrupts, called as exceptions

An I/O interrupt is a subtype of exception

Exceptions can be classified as: Faults, Traps (or) aborts Faults:-

Faults are a type of exceptions that are detected and services BEFORE the execution of the faulting instruction

Ex: In VM, if page or segment referenced by processor is not present, OS fetches that page from Disk, using fault exception routine. Traps:-

Traps are exceptions that are reported immediately AFTER the execution of instructions which causes the problem

Ex: user defined interrupts such as Divide by Zero error Aborts:-

Aborts are exceptions which do not permit precise location of the instruction causing the exception to be found

They are used to report severe errors such as hardware error, illegal values in system.

Debugging:- System software contains a system program called Debugger

A debugger is a program that helps programmer to find and clear errors in a program

It uses two types of exceptions: Trace, Breakpoint

To use trace exception, it is necessary to program the processor in trace mode

If processor is in trace mode, an exception occurs after execution of every instruction

This is used to execute debug program as an exception service routine

This exception service routine makes user to find the contents of register, memory locations. Etc

Trace exception is disabled during the execution of debugging program

A debugger allows programmer to set breakpoints at any point in the program

In this mode, the system executes instructions up to the breakpoint and creates break point exception

This exception routine allows to find contents of registers, memory locations for checking process

Programmer can verify whether his program is correct until that point or not.

11. Write notes on I/O processor and explain its features with a neat diagram.

An I/O processor is aprocessor with DMA and interrupt capability that reduces work load of CPU from communicating with I/O devices

A computer system may have one CPU and one or more IOPs

An IOP that communicates with remote terminals over communication lines and other communication media is called as data communication processor (DCP)

An IOP is not dependent on CPU

It transfers data between external devices and memory under the control of I/O program

I/O program is initiated by CPU

Communication between IOP and device attached to it, is

similar to programmed I/O

IOP and memory communication is through DMA

CPU send instructions to IOP to start or to test status of IOP

When an I/O operation is desired, CPU informs IOP where to find I/O programs

I/O programs contains instructions regarding to data transfer

The instructions in I/O program are prepared by system programmers, called as “commands”

It is different from CPU instruction Features of IOP:-

An IOP can fetch and execute its own instructions

Instructions are specially designed for I/O processing

Intel 8089 IOP can perform arithmetic, logical operations, data transfer operations, searching, branching and translation

IOP does all work involved in I/O transfer including device set up, programmed I/O, DMA

IOP can transfer data from an 8bit source to 16bit destination

Communication between IOP and CPU is through memory based control blocks; CPU defines tasks in control blocks to find a program sequence, called as channel program

IOP supports multiprocessing; IOP and CPU can do processing at the same time.

Intel 8089 IOP:-

study material - · pdf file · 2016-05-15cs6303 – computer architecture ......

Documents