ec2303-com hw and arch.pdf

8/10/2019 EC2303-com HW and arch.pdf

1/26

DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

QUESTION BANK WITH ANSWER

16 MARKS

UNIT-1

1. Describe in detail the different kinds of addressing modes with an example.

Addressing modesEach instruction of a computer specifies an operation on certain data. The are

various ways of specifying address of the data to be operated on. These differentways of specifying data are called the addressing modes. The most common

addressing modes are:

Immediate addressing modeDirect addressing mode

Indirect addressing modeRegister addressing modeRegister indirect addressing modeDisplacement addressing modeStack addressing modeTo specify the addressing mode of an instruction several methods are used. Mostoften used are :

a) Different operands will use different addressing modes.b) One or more bits in the instruction format can be used as mode field. The value

of the modefield determines which addressing mode is to be used.

The effective address will be either main memory address of a register.

Immediate Addressing:This is the simplest form of addressing. Here, the operand is given in the

instruction itself.This mode is used to define a constant or set initial values of variables. The

advantage of this mode is that no memory reference other than instruction fetch isrequired to obtain operand.

The disadvantage is that the size of the number is limited to the size of the address

field,which most instruction sets is small compared to word length.

Direct Addressing:In direct addressing mode, effective address of the operand is given in the addressfield of theinstruction. It requires one memory reference to read the operand from

the given location and provides only a limited address space. Length of the addressfield is usually less than the word


2/26


length.Ex : Move P, Ro, Add Q, Ro P and Q are the address of operand.

Indirect Addressing:

Indirect addressing mode, the address field of the instruction refers to the addressof a word in memory, which in turn contains the full length address of the operand.

The advantage of this mode is that for the word length of N, an address space of2N can be addressed. He disadvantage is that instruction execution requires two

memory reference to fetch the operand Multilevel or cascaded indirect addressingcan also be used.

Register Addressing:Register addressing mode is similar to direct addressing. The only difference is thatthe address field of the instruction refers to a register rather than a memory

location 3 or 4 bits are used as address field to refer ence 8 to 16 generate purposeregisters. The advantages of register addressing are Small address field is needed

in the instruction.

Register Indirect Addressing:This mode is similar to indirect addressing. The address field of the instruction

refers to a register. The register contains the effective address of the operand. Thismode uses one

memory reference to obtain the operand. The address space is limited to the width

of the registers available to store the effective address.

Displacement Addressing:In displacement addressing mode there are 3 types of addressing mode. They are :

1) Relative addressing2) Base register addressing

3) Indexing addressing.This is a combination of direct addressing and register indirect addressing. The

value contained in one address field. A is used directly and the other address refersto a register whose contents are added to A to produce the effective address.

Stack Addressing:Stack is a linear array of locations referred to as last-in first out queue. The stack is

a reserved block of location, appended or deleted only at the top of the stack. Stack

pointer is a register which stores the address of top of stack location. This mode ofaddressing is also known as implicit addressing.


3/26


2. Explain the various Instruction types with examples.Types of Instruction:

1. Data Transfer2. Data Processing

3. Program-control instruction

Data Transfer:The data processing instruction only one memory address and can be specifiedat a time multi operand instructions such as add & multiply must use CPU registers

to store some of their operands.Data-processing instruction by data-transfer instructions that load input

operands into CPU registers or transfer results from the CPU to main memory.

Program-control instruction:The group of instructions called program control or branch instructions

determine the sequence in which instructions are executed.The Program Counter PC specifies the address of the next instructions to be

executed. The IAS has two unconditional branch instructions called jump or

goto instructions which load part of X into PC & next instruction to be taken

from the left half or right half of M(X).Instruction Execution:

The IAS fetches and executes instruction is several steps that form an instructionsin several steps that form an instruction cycle.

Here two instructions arepacked into a-bit word, the IAS fetches two instructions

in each instruction cycle.One instruction has its opcode placed in the instruction register and its address

field (if any) placed in the address register.

3. Briefly explain the organization of ISA computer.The Instruction Set Architecture

The 3 most common types of ISAs are:1. Stack - The operands are implicitly on top of the stack.

2.Accumulator - One operand is implicitly the accumulator.3. General Purpose Register (GPR) - All operands are explicitely mentioned, they

are eitherregisters or

memory locations.

Lets look at the assembly code of

A = B + C;in all 3 architectures:

Stack Accumulator GPR


4/26


PUSH A LOAD A LOAD R1,APUSH B ADD B ADD R1,B

ADD STORE C STORE R1,CPOP C - -

Not all processors can be neatly tagged into one of the above catagories. The i8086has many instruction ns that use implicit operands although it has a general register

set. The i8051 is another example, it has 4 banks of GPRs but most instructionsmust have the A register as one of its operands.

What are the advantages and disadvantages of each of these approachs?

StackAdvantages: Simple Model of expression evaluation (reverse polish). Short

instructions.Disadvantages: A stack can't be randomly accessed. This makes it hard to generate

eficient code. Thestack itself is accessed every operation and becomes abottleneck.

Accumulator

Advantages: Short instructions.Disadvantages: The accumulator is only temporary storage so memory traffic is the

highest for this approach.

superscalar processor --can execute more than one instructions per cycle.

cycle--smallest unit of time in a processor.parallelism--the ability to do more than one thing at once.

pipelining--overlapping parts of a large task to increase throughput withoutdecreasing latency

Well look at some of the decisions facing an instruction set architect, andhow those decisions were made in the design of the MIPS instruction set.

MIPS, like SPARC, PowerPC, and Alpha AXP, is a RISC (Reduced InstructionSet Computer) ISA.

fixed instruction lengthfew instruction formats

load/store architectureRISC architectures worked because they enabled pipelining. They continue to

thrive because they enable parallelism.

Instruction Length

Variable-length instructions (Intel 80x86, VAX) require multi-step fetch anddecode, but allow for a much more flexible and compact instruction set.

Fixed-length instructions allow easy fetch and decode, and simplify pipelining and


5/26


parallelism.Accessing the Operands

operands are generally in one of two places:registers (32 int, 32 fp)

memory (232locations)registers are

easy to specifyclose to the processor (fast access)

the idea that we want to access registers whenever possible led to load-storearchitectures.

normal arithmetic instructions only access registersonly access memory with explicit loads and stores.

How Many Operands?Most instructions have three operands (e.g., z = x + y).

Well-known ISAsspecify 0-3 (explicit) operands per instruction.Operands can be specified implicitly or explicity.

Basic ISA Classes

Accumulator:

1 addressadd Aacc acc + mem[A]Stack:

0 addressaddtostos+ nextGeneral Purpose Register:

Address add A BEA(A) EA(A) + EA(B)Address add A B CEA(A) EA(B) + EA(C)Load/Store:

Address add Ra RbRcRa

Rb+ Rc load Ra RbRamem[Rb]store Ra Rbmem[Rb] Ra

4. With a neat block diagram explain the Accumulator based CPU.The CPU organization proposed by von Neumann and his colleagues for the IAScomputer is the basis for subsequent designs is the basis for most subsequent

designs.It comprises a small set of registers and the circuits needed to execute a

functionally complete set of instructions. One of the CPU registers, the

accumulator playing a central role, being used to store an input or output operand

in the execution of many instructions.This shows at the register level the essential structure of a small accumulator


6/26


oriented CPU. This organization is typical of first generation computers and low cost microcontrollers.

Assume for simplicity that instructions and have some fixed word size n bits andtheir instruction can be expressed by means of register-transfer operations in our

HDL.Instructions are fetched by the program control unit PCU, whose main register is

the program counter PC. They are executed in the data processing unit DPU, whichcontains an n-bit arithmetic logic unit(ALU) and two data registers AC and DR.

Most instructions perform operations of the form,X1:=fi(X1,X2)

5. Explain in detail about CPU organization.

Datapath Design: Capabilities & performance characteristics of principal Functional Units (FUs):

(e.g., Registers, ALU, Shifters, Logic Units, ...) Ways in which these components are interconnected (buses connections,

multiplexors, etc.).

How information flows between components.

Control Unit Design: Logic and means by which such information flow is controlled.

Control and coordination of FUs operation to realize the targeted Instruction SetArchitecture to be implemented (can either be implemented using a finite state

machine or a microprogram).

Hardware description with a suitable language, possibly using Register TransferNotation (RTN).

For a specific program compiled to run on a specific machine A, thefollowing parameters are provided:

The total instruction count of the program. The average number of cycles per instruction (average CPI).

Clock cycle of machine A How can one measure the performance of this machine running this

program? Intuitively the machine is said to be faster or has better performance running this

program if the total execution time is shorter. Thus the inverse of the total measured program execution time is a possible

performance

measure or metric:

PerformanceA = 1 / Execution TimeA


7/26


6. Describe about register level components.Register level circuits are composed of word oriented devices. The key sequential

component, which gives this level of abstraction its name, is a register, a storagedevice for words.

Word gates

Word gates are universal in they suffice to implement any logiccircuits.Moreover,word gate circuits can be analysed Boolean algebra.It is alsoused to represent scalar vectoroperations by a single gate symbol.

MultiplexerA Multiplexer is a device intended to route data from one of several sources to acommon destination,the source is specified by applying appropriate control to themultiplexers.

If the maximum number of K input is source and each I/O dataline carries m-bits,the

multiplexer is refered to as a K input, m-bit multiplexer.Multiplexer have theinteresting property that can compute any combinatioal function and so form a type

of universal logic generator.

Decoderso Main application Address Decodingo -A 1-out of 2n or 1/2n decoder is a combinational circuit with n input lines x and

2n output lines z such that each of the 2n possible input combinations.o Encoder

o An Encoder is intented to generate the address or index of an active input line.It

is therefore the inverse of decoder.It has 2k input lines and k o/p lines.

Arithmetic Elementso Simple arithmetic function ,addition and subtraction of fixed point number canbe implemented bycombinational register level components.Adders and

Subtracters for fixed point binary numbers are basic register level componentsfrom which we can derive a variety of arithmetic circuits.

o The adders carry-in and carry-out lines allow several copies of this component tobe chained together to add numbers to arbitrary size.

o Arithmetic components is a magnitude comparator, whose function is to comparethe magnitudes of two binary numbers.

Programmable Logic DevicesA class of components called programmable logic devices or PLDs, a termapplied to ICs containing many gates or other general purpose cells whoseinterconnections can be configured or Programmed to implement any desired

combinational or sequential function.Two Techniques are used to program PLDSi)Mask Programming


8/26


ii)Field Programming

Mask Programmingo It requires a few special steps in the IC-Manufacturing process FieldProgramming

o It is done by designer or end user in the field via low cost programming units.

Programmable Logic Arrayo PLA is universal function generates capable of realizing a set of logic functionsthat

depend on some maximum number of variables. It consist of an array of ANDgates

which realize a set of product terms & a set of OR gates which forms variouslogical sum of the product terms.

Programmable Array Logico PAL circuits have an AND plane that is programmable but on OR plane with

fixedconnections designed to link each output line to a fixed set.

o PAL output can realize only a two level expression containing a most 8 terms. A

PAL advantages are easy of use in some applications as well as higher speed

because output fan-out is restricted.

UNIT II

1. Illustrate Booth Algorithm with an example.Observation: If besides addition we also use subtraction, we can reduce the number

of consecutives additions and therefore we can make the multiplication faster.This requires to recode the multiplier in such a way that the number ofconsecutive 1s in the multiplier(indeed the number of consecutive additions we

should have done) are reduced.The key to Booths algorithm is to scan the multiplier and classify group of bits

into the beginning, the middle and the end of a run of 1s

2. Design a 4-bit Carry-Look ahead Adder and explain its operation with an

example.

i i i i i i ii i i i

c x y x c y c

s x y c

1


9/26


i i i i i i c x y (x y )c 1 i i i

i i i

i i i i

P x y

G x y

c G Pc

1i i i i c GPc1

111i i i i i i i c G PG PP c1 1 1c G PG PP G ii i ii iRipple-Carry Adder

2n gate delays = 8 gate delaysCarry-Lookahead Adder4 gate delays16-bit Carry-Lookahead Adder

12 1 1 0 1 0 0 PP PG PP P c i i i i iPropagation Delay (32-bit)

Ripple-Carry Adder2n gate delays = 32 gate delaysCascading (4) 4-bit Carry10 gate delaysCarry-Lookahead Adder (w/higher level fns)8 gate delaysRipple-Carry Adder(16 bits)2n gate delays = 64 gate delaysCascading (8) 4-bit Carry18 gate delaysCarry-Lookahead Adder (w/higher level fns)10 gate delays

Carry-Lookahead AddersCarry-Lookahead Adders

3. With a neat block diagram explain in detail about the coprocessor.Complicated arithmetic operations like exponential and trigonometricfunctions are closely to implement in CPU hardware, while software

implementation of this operation are slow .


10/26


To overcome this problem, the alternate method is design auxillary unit thperform complex arithmetic operation called arithmetic coprocessors or

simply coprocessor, provide fast and low cost hardware.

In general, it is a separate instruction set processor. Coprocessor is closelycoupled to CPU

Both CPU and Coprocessor execThe instructions intended for the coprocessor are fetched by CPU, jointlydecoded by CPU and the Coprocessor, executed by the coprocessor in a

manner that is transparent to the programmer.

A coprocessor requirecoprocessor and to handle the instructions that are executed by thecoprocessor.

subtractor and explain its functions.execute the instructions from the same program.

Connection between CPU and Coprocessoro The coprocessor is connected to the CPU by several control lines that allow

the activites of the two processes to be co-ordinated.

o In this CPU-coprocessor interface, CPU act as MASTER and Coprocessor is

a SLAVE device to the CPU.o When Coprocessor instructions are encountered, the communication

between CPU and Coprocessor to initiate and terminate execution ofcoprocessor instructions occurs automatically.

o Thus the coprocessor approach makes it possible either hardware or

software support without altering the source code of the program beingexecuted .

A coprocessor instruction typically consist of three fields:1. Code/Opcode-(Fo) distinguishes coprocessor instruction from other CPU

instruction.2. Address(F1) indicates the address of particular coprocessor when several

coprocessorare used in the system and finally.

3. Operand(F2) decides the operation to be executed by the coprocessor.


11/26


4. Write the algorithm for division of floating point numbers and illustrate

with an

Example.


12/26



13/26


UNIT III

1. (a). Write short notes on Nano Programming.

Nanoprogramming

Use a 2-level control storage organization

Top level is a vertical format memory Output of the top level memory drives the address register of the bottom

(nano- level) memory Nanomemory uses the horizontal format

Produces the actual control signaloutputs

The advantage to this approach is significant saving in control memory size (bits) Disadvantage is more complexity and slower operation (doing 2 memory

accesses fro each microinstruction)Nano programmed machineExample: Supppose that a system is being designed with 200 control pointsand 2048 microinstructions

Assume that only 256 different combinations of control points are ever used A single-level control memory would require 2048x200=409,600 storage bits

A nano programmed system would use Microstore of size 2048x8=16k

Nanostore of size 256x200=51200

Total size = 67,584 storage bits

Nano programming has been used in many CISC microprocessorsApplications of MicroprogrammingMicroprogramming application: emulation The use of a microprogram on one machine to execute programs originallywritten to run on another (different!) machine

By changing the microcode of a machine, you can make it execute softwarefrom another machine

Commonly used in the past to permit new machines to continue to run oldsoftware

VAX11-780 had 2 modes(b). Describe the characteristics of super scalar processing.Today, a computer designer is usually faced with maintaining binary compatibility,

i.e.,maintaining instruction set compatibility and a sequential execution model

(which typically implies precise interrupts1). For high performance, however,superscalar processor


14/26


implementations deviate radically from sequential execution -- much has to bedone in parallel. As a result, the program binary nowadays should viewed as a

specification of what has to be done, not how it is done in reality. A modernsuperscalar microprocessor takes the sequential specification as embodied in the

program binary and removes much of the nonessential sequentiality to turn theprogram into a parallel, higher-performance version, yet the processor retains the

outward appearance of sequential execution.

1.3. Elements of High Performance ProcessingSimply stated, achieving higher performance means processing a given program in

a smaller amount of time. Each individual instruction takes some time to fetch andexecute; this time is the instructions latency. To reduce the time to execute a

sequence of instructions (e.g. a program), one can: (i) reduce individual instructionlatencies, or (ii) execute more instructions in parallel. Because superscalar

processor implementations are distinguished by the latter (while adequate attentionis also paid to the former), we willconcentrate on the latter method in this paper.

Nevertheless, a significant challenge in superscalar design is to not increase

instruction latencies due to increased hardware complexity brought about by the

drive for enhanced parallelism. Parallel instruction processing requires: thedetermination of the dependence relationships between instructions, adequate

hardware resources to execute multiple operations in parallel, strategies todetermine when an operation is ready for execution, and techniques to pass values

from one operation to another. When the effects of instructions are committed, and

the visible state of the machine updated, the appearance of sequential executionmust be maintained. More precisely, in hardware terms, this means a superscalar

processor implements:i) Instruction fetch strategies that simultaneously fetch multiple instructions, often

by predicting the outcomes of, and fetching beyond, conditional branchinstructions,

ii) Methods for determining true dependences involving register values, andmechanisms for communicating these values to where they are needed during

execution,iii) Methods for initiating, or issuing, multiple instructions in parallel,

iv) Resources for parallel execution of many instructions, including multiplepipelined functional units and memory hierarchies capable of simultaneously

servicing multiple memory references,

v) Methods for communicating data values through memory via load and store

instructions, and memory interfaces that allow for the dynamic and oftenunpredictable performance behavior of memory hierarchies. These interfaces must

be well-matched with the instruction execution strategies.


15/26


vi) Methods for committing the process state in correct order; these mechanismsmaintain an outward appearance of sequential execution. Although we will discuss

the above items separately, in reality they cannot be completelyseparated nor should they be. In good superscalar designs they are often

integrated in a cohesive, almost seamless, manner.

2. Discuss the various hazards that might arise in a pipeline. What are the

remedies commonly adopted to overcome/minimize these hazards.

The Pipeline Defined

Pipelining

John Hayes provides a definition of a pipeline as it applies to a computerprocessor.

"A pipeline processor consists of a sequence of processing circuits, calledsegments or stages, through which a stream of operands can be passed.

"Partial processing of the operands takes place in each segment."... a fully processed result is obtained only after an operand set has passed through

the entire pipeline."

In everyday life, people do many tasks in stages. For instance, when we do the

laundry, we place a load in the washing machine. When it is done, it is transferredto the dryer and another load is placed in the washing machine. When the first load

is dry, we pull it out for folding or ironing, moving the second load to the dryer andstart a third load in the washing machine. We proceed with folding or ironing of

the first load while the second and third loads are being dried and washed,

respectively. We may have never thought of it this way but we do laundry bypipeline processing.

A Pipeline is a series of stages, where some work is done at each stage. The workis not finished until it has passed through all stages.

Let us review Hayes' definition as it pertains to our laundry example. The washingmachine is one "sequence of processing circuits" or a stage. The second is the

dryer. The third is the folding or ironing stage."A significant aspect of our civilization is the division of labor. Major engineering

achievements are based o n subdividing the total work into individual tasks whichcan be handled despite their inter-dependencies.

"Overlap and pipelining are essentially operation management techniquesbased on job

sub-divisions under a precedence constraint."


16/26


Types of Pipelines

Instructional pipeline:where different stages of an instruction fetch and execution are handled in a

pipeline.

Arithmetic pipeline:where different stages of an arithmetic operation are handled along the stages of apipeline.

The above definitions are correct but are based on a narrow perspective, consideronly the central processor. There are other type of computing pipelines. Pipelines

are used to compress and transfer video data. Another is the use of specializedhardware to perform graphics display tasks. Discussing graphics displays, Ware

Myers wrote:"...the pipeline concept ... transforms a model of some object into representations

that successively become more machine-dependent and finally results in an imageupon a particular screen.

This example of pipelining fits the definitions from Hayes and Chen but not the

categories offered by Tabaz. These broader categories are beyond the scope of this

paper and are mentioned only to alert the reader that different authors meandifferent things when referring to pipelining.

DisadvanatgesThere are two disadvantages of pipeline architecture. The first is complexity. The

second is the inability to continuously run the pipeline at full speed, i.e. the

pipeline stalls.Let us examine why the pipeline cannot run at full speed. There are phenomena

called pipeline hazards which disrupt the smooth execution of the pipeline. Theresulting delays in the pipeline flow are called bubbles. These pipeline hazards

include

structural hazards from hardware conflictsdata hazards arising from data dependencies control hazards that come above from branch, jump, and other control flowchangesThese issues can and are successfully dealt with. But detecting and avoiding

the hazards leads to a considerable increase in hardware complexity. The controlpaths controlling the gating between stages can contain more circuit levels than the

data paths being controlled. In 1970, this complexity is one reason that led Foster

to call pipelining

"still-controversial" .1. Instruction fetch

2. Instruction decode and register fetch


17/26


3. Execute4. Memory access

5. Register write backHazards: When a programmer (or compiler) writes assembly code, they make the

assumption that each instruction is executed before execution of the subsequentinstruction is begun. This assumption is invalidated by pipelining. When this

causes a program to behave incorrectly, the situation is known as a hazard. Varioustechniques for resolving hazards such as forwarding and stalling exist.

Hazard (computer architecture)In computer architecture, a hazard is a potential problem that can happen in a

pipelined processor. It refers to the possibility of erroneous computation when aCPU tries to simultaneously execute multiple instructions which exhibit data

dependence. There are typically three types of hazards: data hazards, structuralhazards, and branching hazards (control hazards).

Instructions in a pipelined processor are performed in several stages, so that at anygiven time several instructions are being executed, and instructions may not be

completed in the desired order.

A hazard occurs when two or more of these simultaneous (possibly out of order)

instructions conflict.

1 Data hazardso 1.1 RAW - Read After Writeo 1.2 WAR - Write After Read

o 1.3 WAW - Write After Write

2 Structural hazards3 Branch (control) hazards4 Eliminating hazardso 4.1 Eliminating data hazards

o 5.1 Eliminating branch hazards

Data hazards

A major effect of pipelining is to change the relative timing of instructions byoverlapping their execution.

This introduces data and control hazards. Data hazards occur when the pipelinechanges the order of read/write accesses to operands so that the order differs from

the order seen by sequentially executing instructions on the unpipelined machine.Consider the pipelined execution of these instructions:

1 2 3 4 5 6 7 8 9

AD

DR1, R2, R3 I

F


18/26


ID

EX MEM

WBSU

BR4, R5, R1 I

F

IDs

ubEX ME

MWB

AND

R6, R1, R7 IF IDan

dEX MEM

WBOR R8, R1, R9 IF IDor EX ME

M

WBXO

RR10,R1,R11 IF IDxo

rEX ME

MW

BAll the instructions after the ADD use the result of the ADD instruction (in R1).

The ADD instruction writes the value of R1 in the WB stage (shown black), andthe SUB instruction reads the value during ID stage (IDsub). This problem is

called a data hazard. Unless precautions are taken to prevent it, the SUB

instruction will read the wrong value and try to use it.

The AND instruction is also affected by this data hazard. The write of R1 does notcomplete until the end of


19/26


cycle 5 (shown black). Thus, the AND instruction that reads the registers duringcycle 4 (IDand) will receive the wrong result.

The OR instruction can be made to operate without incurring a hazard by a simpleimplementation technique.

The technique is to perform register file reads in the second half of the cycle, andwrites in the first half.

Because both WB for ADD and IDor for OR are performed in one cycle 5, thewrite to register file by ADD

will perform in the first half of the cycle, and the read of registers by OR willperform in the second half ofthe cycle.

The XOR instruction operates properly, because its register read occur in cycle 6after the register write by

ADD.The next page discusses forwarding, a technique to eliminate the stalls for the

hazard involving the SUB and AND instructions.We will also classify the data hazards and consider the cases when stalls can not be

eliminated. We will see what compiler can do to schedule the pipeline to avoid

stalls.

A hazard is created whenever there is a dependence between instructions, and theyare close enough that the overlap caused by pipelining would change the order of

access to an operand. Our example hazards have all been with register operands,but it is also possible to create a dependence by writing and reading the same

memory location. In DLX pipeline, however, memory references are always kept

in order, preventing this type of hazard from arising.All the data hazards discussed here involve registers within the CPU. By

convention,the hazards are named by the ordering in the program that must be

preserved by the pipeline.

RAW (read after write)

WAW

WAR (write after read)

(write after write)Consider two instructions i and j, with i occurring before j. The possible data

hazards are:

RAW (read after write) -j tries to read a source before i writes it, soj incorrectly

gets the old value.

This is the most common type of hazard and the kind that we use forwarding to

overcome.

WAW (write after write) -j tries to write an operand before it is written by i. The

writes end up being


20/26


performed in the wrong order, leaving the value written by i rather than the value

written byj in the destination.

This hazard is present only in pipelines that write in more than one pipe stage orallow an instruction to proceed even when a previous instruction is stalled. The

DLX integer pipeline writes a register only in WB and avoids this class of hazards.WAW hazards would be possible if we made the following two changes to the

DLX pipeline:Here is a sequence of two instructions showing the execution in this revised

pipeline, highlighting the pipe stage that writes the result:LW R1, 0(R2) IF ID EX MEM1 MEM2 WB

ADD R1, R2, R3 IF ID EX WBUnless this hazard is avoided, execution of this sequence on this revised pipeline

will leave the result of the first write (the LW) in R1, rather than the result of theADD.

Allowing writes in different pipe stages introduces other problems, since twoinstructions can try to write during the same clock cycle. The DLX FP pipeline ,

which has both writes in different stages and different

pipeline lengths, will deal with both write conflicts and WAW hazards in detail.

WAR (write after read) -j tries to write a destination before it is read by i , so iincorrectly gets the new value.

This can not happen in our example pipeline because all reads are early (in ID) andall writes are late (in WB).

This hazard occurs when there are some instructions that write results early in the

instruction pipeline, and other instructions that read a source late in the pipeline.Because of the natural structure of a pipeline, which typically reads values before it

writes results, such hazards are rare. Pipelines for complex instruction sets thatsupport autoincrement addressing and requireoperands to be read late in the

pipeline could create a WAR hazards. If we modified the DLX pipeline as in theabove example and also read some operands late, such as the source value for a

store instruction, a WAR hazard could occur. Here is the pipeline timing for such apotential hazard, highlighting the stage where the conflict occurs:

SW R1, 0(R2) IF ID EX MEM1 MEM2 WBADD R2, R3, R4 IF ID EX WB

If the SW reads R2 during the second half of its MEM2 stage and the Add writesR2 during the first half of its WB stage, the SW will incorrectly read and store the

value produced by the ADD.

RAR (read after read) - this case is not a hazard :).

Structural hazardsA structural hazard occurs when a part of the processor's hardware is needed by

two or more instructions at the same time. A structural hazard might occur, for


21/26


instance, if a program were to execute a branch instruction followed by acomputation instruction.

Branch (control) hazardsBranching hazards (also known as control hazards) occur when the processor is

told to branch - i.e., if a certain condition is true, then jump from one part of theinstruction stream to another - not necessarily to the next instruction sequentially.

In such a case, the processor cannot tell in advance whether it should process thenext instruction (when it may instead have to move to a distant instruction).

This can result in the processor doing unwanted actions.Acache miss. A cache miss stalls all the instructions on pipeline both before and

after the instruction causing the miss.Ahazard in pipeline. Eliminating a hazard often requires that some instructions in

the pipeline to be allowed to proceed while others are delayed. When theinstruction is stalled, all the instructions issued later than the

stalled instruction are also stalled. Instructions issued earlier than the stalledinstruction must continue, since otherwise the hazard will never clear.

A hazard causes pipeline bubbles to be inserted.The following table shows how the

stalls are actually implemented. As a result, no new instructions are fetched during

clock cycle 4, no instruction will finish during clock cycle 8.In case of structural hazards:

Instr 1 2 3 4 5 6 7 8 9 10Instr i I

F

ID

EX MEM WBInstr i+1 IF ID EX MEM WB

Instr i+2 IF ID EX MEM WBStallbubble bubble bubble Bubble Bubble

Instr i+3 IF ID EX MEM WBInstr i+4 IF ID EX MEM WB

To simplify the picture it is also commonly shown like this:

Clock cycle number

Instr 1 2 3 4 5 6 7 8 9 10Instr i IF ID EX MEM WB

Instr i+1 IF ID EX MEM WB


Instr i+3 stall IF ID EX MEM WBInstr i+4 IF ID EX MEM WB

In case of data hazards:


22/26


Clock cycle numberInstr 1 2 3 4 5 6 7 8 9 10

Instr i IF ID EX MEM WBInstr i+1 IF IDbubble EX MEM WB

Instr i+2 IFbubble ID EX MEM WBInstr i+3bubble IF ID EX MEM WB

Instr i+4 IF ID EX MEM WBwhich appears the same with stalls:

Clock cycle numberInstr 1 2 3 4 5 6 7 8 9 10

Instr i IF ID EX MEM WBInstr i+1 IF ID stall EX MEM WB

Instr i+2 IF stall ID EX MEM WBInstr i+3 stall IF ID EX MEM WB


UNIT IV

1.What do you mean by memory hierarchy ? Briefly discuss.

Memory is technically any form of electronic storage. Personal computer systemhave a hierarchical memory structure consisting of auxiliary memory (disks), main

memory (DRAM) and cache memory (SRAM). A design objective of computer

system architects is to have the memory hierarchy work as through it were entirelycomprised of the fastest memory type in the system.

2. What is Cache memory?Cache memory: Active portion of program and data are stored in a fast smallmemory, the average memory access time can be reduced, thus reducing the

execution time of the program. Such a fast small memory is referred to as cachememory. It is placed between the CPU and main memory as shown in figure.

3. What do you mean by interleaved memory?

The memory is partitioned into a number of modules connected to a commonmemory address and data buses. A primary module is a memory array together

with its own addressed data registers. Figure shows a memory unit with four

modules.


23/26


4. How many memory chips of4128x8) are needed to provide memory

capacity of 40 x 16.Memory capacity is 4096 x 16Each chip is 128 8

No. of chips which is 128 x 8 of 4096 x 16 memory capacity

5. Explain about main memory.Ans. RAM is used as main memory or primary memory in the computer. This

memory is mainly used by CPU so it is formed as primary memory RAM is alsoreferred as the primary memory of computer. RAM is volatile memory because its

contents erased up after the electrical power is switched off. ROM also come undercategory of primary memory. ROM is non volatile memory. Its contents will be

retained even after electrical power is switched off. ROM is read only memory andRAM is read-write memory. Primary memory is the high speed memory. It can be

accessed immediately and randomly.

UNIT V

1. Explain in detail about interrupt handling.INTERRUPT HANDLING

Handling Interrupts Many situations where the processor should ignore interrupt requests

Interrupt-disable

Interrupt-enableTypical scenario

Device raises interrupt requestProcessor interrupts program being executed

Processor disables interrupts and acknowledges interruptInterrupt-service routine executed

Interrupts enabled and program execution resumedAn equivalent circuit for an open-drain bus used to implement a common interrupt-

request line.Handling Multiple Devices

Interrupt PriorityDuring execution of interrupt-service routine

Disable interrupts from devices at the same level priority or lower

Continue to accept interrupt requests from higher priority devices

Privileged instructions executed in supervisor modeControlling device requests

Interrupt-enable


24/26


KEN, DENPolled interrupts:Priority determined by the order in which processor polls the

devices (polls their status registers)Vectored interrupts:Priority determined by theorder in which processor

tells of INTA:If device has not requested service, passes the INTA signal todeviceto put its code on the address lines (order of connection in the chain)

Daisy chaining next Device If needs service, does not pass the INTA, puts its codeon the address lines Polled

Multiple InterruptsPriority in Processor

Status WordStatus Register --active program

Status Word --inactive programChanged only by privileged instruction

Mode changes --automatic or by privilegedInterrupt enable/disable, by device, system-wide

Common Functions of Interrupts

Interrupt transfers control to the interrupt service routine, generally through the

interrupt vector table, which contains the addresses of all the service routines.Interrupt architecture must save the address of the interrupted instruction and the

contents of the processor status register.Incoming interrupts are disabledwhile another interrupt is being processed to

prevent a lost interrupt.

A software-generated interrupt may be caused either by an error or a user request(sometimes called a trap).

An operating system is interruptdriven.Hardware interruptsfrom I/O devices, memory, processor, Software interrupts

Generatedby a program.

2. Explain in detail about standard I/O interface.From the discussions so far, the reader must have understood that the input/output

system for a computer are accommodated in many layers, like memory devices.We have already discussed about cache memory and a special high speed bus to

communicate with it, designated as cache bus. Main memory of the system is alsointerfaced with the processor with a dedicated memory bus so that delay in I/O

operations, which is quite normal and happens frequently, does not retard the

instruction that has to be carried out.

The innermost layer of I/O devices is directly interfaced with the processor throughits address, data and control bus (designated as I/O bus), and communicates in

synchronous manner (synchronous parallel communication).


25/26


Note that although with processor these devices communicate in synchronousfashion, with the external world they communicate in asynchronous manner.

Example of I/O devices of this layer may be 8255-based ports, timers/counters forreal-time operations, USART for serial communication and other similar devices.

Generic Model of IO ModuleCPU checks I/O module device status

I/O module returns statusIf ready, CPU requests data transfer

I/O module gets data from deviceI/O module transfers data to CPU

3. Describe the functions of SCSI with a neat diagram.SCSI Bus Defined by ANSI X3.131

Small Computer System Interface 50, 68 or 80 pins

Max. transfer rate 160 MB/s, 320 MB/s.

4. Discuss the DMA driven data transfer technique.Polling or interrupt driven I/O incurs considerable overhead

Multiple program instructionsSaving program state

Incrementing memory addresses

Keeping track of word countTransfer large amounts of data at high speed without continuous intervention by

the processorSpecial control circuit required in the I/O device interface, called a DMA

controllerDMA Controller

Part of the I/O device interfaceDMA Channels

Performs functions that would normally be carried out by the processorProvides memory address

Bus signals that control transferKeeps track of number of transfers

Under control of the processor


26/26


5. Describe Bus Arbitration.

In a single bus architecture when more than one device requests the bus, acontroller called bus arbiter decides who gets the bus, this is called the bus

arbitration. In computing, bus mastering is a feature supported by many bus architectures

that enables a device connected to the bus to initiate transactions. The procedure in bus communication that chooses between connected devices

contending for control of the shared bus; the device currently in control of the busis often termed the bus master. Devices may be allocated differing priority levels

that will determine the choice of bus master in case of contention. A device not currently bus master must request control of the bus before

attempting to initiate a data transfer via the bus. The normal protocol is that only one device may be bus master at any time and

that all other devices act as slaves to this master. Only a bus master may initiate a normal data transfer on the bus; slave devices

respond to commands issued by the current bus master by supplying data requested

or accepting data sent.

Centralized arbitration Distributed arbitration All devices have equal responsibility in carrying out the arbitration process. Each device on the bus assigned an identification number.

Place their ID numbers on four open-collector lines.

ec2303-com hw and arch.pdf

Documents