ec2303-com hw and arch.pdf

Upload: adeivaseelan

Post on 02-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 EC2303-com HW and arch.pdf

    1/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    EC2303-COMPUTER ARCHITECTURE AND ORGANIZATION

    QUESTION BANK WITH ANSWER

    16 MARKS

    UNIT-1

    1. Describe in detail the different kinds of addressing modes with an example.

    Addressing modesEach instruction of a computer specifies an operation on certain data. The are

    various ways of specifying address of the data to be operated on. These differentways of specifying data are called the addressing modes. The most common

    addressing modes are:

    Immediate addressing modeDirect addressing mode

    Indirect addressing modeRegister addressing modeRegister indirect addressing modeDisplacement addressing modeStack addressing modeTo specify the addressing mode of an instruction several methods are used. Mostoften used are :

    a) Different operands will use different addressing modes.b) One or more bits in the instruction format can be used as mode field. The value

    of the modefield determines which addressing mode is to be used.

    The effective address will be either main memory address of a register.

    Immediate Addressing:This is the simplest form of addressing. Here, the operand is given in the

    instruction itself.This mode is used to define a constant or set initial values of variables. The

    advantage of this mode is that no memory reference other than instruction fetch isrequired to obtain operand.

    The disadvantage is that the size of the number is limited to the size of the address

    field,which most instruction sets is small compared to word length.

    Direct Addressing:In direct addressing mode, effective address of the operand is given in the addressfield of theinstruction. It requires one memory reference to read the operand from

    the given location and provides only a limited address space. Length of the addressfield is usually less than the word

  • 8/10/2019 EC2303-com HW and arch.pdf

    2/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    length.Ex : Move P, Ro, Add Q, Ro P and Q are the address of operand.

    Indirect Addressing:

    Indirect addressing mode, the address field of the instruction refers to the addressof a word in memory, which in turn contains the full length address of the operand.

    The advantage of this mode is that for the word length of N, an address space of2N can be addressed. He disadvantage is that instruction execution requires two

    memory reference to fetch the operand Multilevel or cascaded indirect addressingcan also be used.

    Register Addressing:Register addressing mode is similar to direct addressing. The only difference is thatthe address field of the instruction refers to a register rather than a memory

    location 3 or 4 bits are used as address field to refer ence 8 to 16 generate purposeregisters. The advantages of register addressing are Small address field is needed

    in the instruction.

    Register Indirect Addressing:This mode is similar to indirect addressing. The address field of the instruction

    refers to a register. The register contains the effective address of the operand. Thismode uses one

    memory reference to obtain the operand. The address space is limited to the width

    of the registers available to store the effective address.

    Displacement Addressing:In displacement addressing mode there are 3 types of addressing mode. They are :

    1) Relative addressing2) Base register addressing

    3) Indexing addressing.This is a combination of direct addressing and register indirect addressing. The

    value contained in one address field. A is used directly and the other address refersto a register whose contents are added to A to produce the effective address.

    Stack Addressing:Stack is a linear array of locations referred to as last-in first out queue. The stack is

    a reserved block of location, appended or deleted only at the top of the stack. Stack

    pointer is a register which stores the address of top of stack location. This mode ofaddressing is also known as implicit addressing.

  • 8/10/2019 EC2303-com HW and arch.pdf

    3/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    2. Explain the various Instruction types with examples.Types of Instruction:

    1. Data Transfer2. Data Processing

    3. Program-control instruction

    Data Transfer:The data processing instruction only one memory address and can be specifiedat a time multi operand instructions such as add & multiply must use CPU registers

    to store some of their operands.Data-processing instruction by data-transfer instructions that load input

    operands into CPU registers or transfer results from the CPU to main memory.

    Program-control instruction:The group of instructions called program control or branch instructions

    determine the sequence in which instructions are executed.The Program Counter PC specifies the address of the next instructions to be

    executed. The IAS has two unconditional branch instructions called jump or

    goto instructions which load part of X into PC & next instruction to be taken

    from the left half or right half of M(X).Instruction Execution:

    The IAS fetches and executes instruction is several steps that form an instructionsin several steps that form an instruction cycle.

    Here two instructions arepacked into a-bit word, the IAS fetches two instructions

    in each instruction cycle.One instruction has its opcode placed in the instruction register and its address

    field (if any) placed in the address register.

    3. Briefly explain the organization of ISA computer.The Instruction Set Architecture

    The 3 most common types of ISAs are:1. Stack - The operands are implicitly on top of the stack.

    2.Accumulator - One operand is implicitly the accumulator.3. General Purpose Register (GPR) - All operands are explicitely mentioned, they

    are eitherregisters or

    memory locations.

    Lets look at the assembly code of

    A = B + C;in all 3 architectures:

    Stack Accumulator GPR

  • 8/10/2019 EC2303-com HW and arch.pdf

    4/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    PUSH A LOAD A LOAD R1,APUSH B ADD B ADD R1,B

    ADD STORE C STORE R1,CPOP C - -

    Not all processors can be neatly tagged into one of the above catagories. The i8086has many instruction ns that use implicit operands although it has a general register

    set. The i8051 is another example, it has 4 banks of GPRs but most instructionsmust have the A register as one of its operands.

    What are the advantages and disadvantages of each of these approachs?

    StackAdvantages: Simple Model of expression evaluation (reverse polish). Short

    instructions.Disadvantages: A stack can't be randomly accessed. This makes it hard to generate

    eficient code. Thestack itself is accessed every operation and becomes abottleneck.

    Accumulator

    Advantages: Short instructions.Disadvantages: The accumulator is only temporary storage so memory traffic is the

    highest for this approach.

    superscalar processor --can execute more than one instructions per cycle.

    cycle--smallest unit of time in a processor.parallelism--the ability to do more than one thing at once.

    pipelining--overlapping parts of a large task to increase throughput withoutdecreasing latency

    Well look at some of the decisions facing an instruction set architect, andhow those decisions were made in the design of the MIPS instruction set.

    MIPS, like SPARC, PowerPC, and Alpha AXP, is a RISC (Reduced InstructionSet Computer) ISA.

    fixed instruction lengthfew instruction formats

    load/store architectureRISC architectures worked because they enabled pipelining. They continue to

    thrive because they enable parallelism.

    Instruction Length

    Variable-length instructions (Intel 80x86, VAX) require multi-step fetch anddecode, but allow for a much more flexible and compact instruction set.

    Fixed-length instructions allow easy fetch and decode, and simplify pipelining and

  • 8/10/2019 EC2303-com HW and arch.pdf

    5/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    parallelism.Accessing the Operands

    operands are generally in one of two places:registers (32 int, 32 fp)

    memory (232locations)registers are

    easy to specifyclose to the processor (fast access)

    the idea that we want to access registers whenever possible led to load-storearchitectures.

    normal arithmetic instructions only access registersonly access memory with explicit loads and stores.

    How Many Operands?Most instructions have three operands (e.g., z = x + y).

    Well-known ISAsspecify 0-3 (explicit) operands per instruction.Operands can be specified implicitly or explicity.

    Basic ISA Classes

    Accumulator:

    1 addressadd Aacc acc + mem[A]Stack:

    0 addressaddtostos+ nextGeneral Purpose Register:

    Address add A BEA(A) EA(A) + EA(B)Address add A B CEA(A) EA(B) + EA(C)Load/Store:

    Address add Ra RbRcRa

    Rb+ Rc load Ra RbRamem[Rb]store Ra Rbmem[Rb] Ra

    4. With a neat block diagram explain the Accumulator based CPU.The CPU organization proposed by von Neumann and his colleagues for the IAScomputer is the basis for subsequent designs is the basis for most subsequent

    designs.It comprises a small set of registers and the circuits needed to execute a

    functionally complete set of instructions. One of the CPU registers, the

    accumulator playing a central role, being used to store an input or output operand

    in the execution of many instructions.This shows at the register level the essential structure of a small accumulator

  • 8/10/2019 EC2303-com HW and arch.pdf

    6/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    oriented CPU. This organization is typical of first generation computers and low cost microcontrollers.

    Assume for simplicity that instructions and have some fixed word size n bits andtheir instruction can be expressed by means of register-transfer operations in our

    HDL.Instructions are fetched by the program control unit PCU, whose main register is

    the program counter PC. They are executed in the data processing unit DPU, whichcontains an n-bit arithmetic logic unit(ALU) and two data registers AC and DR.

    Most instructions perform operations of the form,X1:=fi(X1,X2)

    5. Explain in detail about CPU organization.

    Datapath Design: Capabilities & performance characteristics of principal Functional Units (FUs):

    (e.g., Registers, ALU, Shifters, Logic Units, ...) Ways in which these components are interconnected (buses connections,

    multiplexors, etc.).

    How information flows between components.

    Control Unit Design: Logic and means by which such information flow is controlled.

    Control and coordination of FUs operation to realize the targeted Instruction SetArchitecture to be implemented (can either be implemented using a finite state

    machine or a microprogram).

    Hardware description with a suitable language, possibly using Register TransferNotation (RTN).

    For a specific program compiled to run on a specific machine A, thefollowing parameters are provided:

    The total instruction count of the program. The average number of cycles per instruction (average CPI).

    Clock cycle of machine A How can one measure the performance of this machine running this

    program? Intuitively the machine is said to be faster or has better performance running this

    program if the total execution time is shorter. Thus the inverse of the total measured program execution time is a possible

    performance

    measure or metric:

    PerformanceA = 1 / Execution TimeA

  • 8/10/2019 EC2303-com HW and arch.pdf

    7/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    6. Describe about register level components.Register level circuits are composed of word oriented devices. The key sequential

    component, which gives this level of abstraction its name, is a register, a storagedevice for words.

    Word gates

    Word gates are universal in they suffice to implement any logiccircuits.Moreover,word gate circuits can be analysed Boolean algebra.It is alsoused to represent scalar vectoroperations by a single gate symbol.

    MultiplexerA Multiplexer is a device intended to route data from one of several sources to acommon destination,the source is specified by applying appropriate control to themultiplexers.

    If the maximum number of K input is source and each I/O dataline carries m-bits,the

    multiplexer is refered to as a K input, m-bit multiplexer.Multiplexer have theinteresting property that can compute any combinatioal function and so form a type

    of universal logic generator.

    Decoderso Main application Address Decodingo -A 1-out of 2n or 1/2n decoder is a combinational circuit with n input lines x and

    2n output lines z such that each of the 2n possible input combinations.o Encoder

    o An Encoder is intented to generate the address or index of an active input line.It

    is therefore the inverse of decoder.It has 2k input lines and k o/p lines.

    Arithmetic Elementso Simple arithmetic function ,addition and subtraction of fixed point number canbe implemented bycombinational register level components.Adders and

    Subtracters for fixed point binary numbers are basic register level componentsfrom which we can derive a variety of arithmetic circuits.

    o The adders carry-in and carry-out lines allow several copies of this component tobe chained together to add numbers to arbitrary size.

    o Arithmetic components is a magnitude comparator, whose function is to comparethe magnitudes of two binary numbers.

    Programmable Logic DevicesA class of components called programmable logic devices or PLDs, a termapplied to ICs containing many gates or other general purpose cells whoseinterconnections can be configured or Programmed to implement any desired

    combinational or sequential function.Two Techniques are used to program PLDSi)Mask Programming

  • 8/10/2019 EC2303-com HW and arch.pdf

    8/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    ii)Field Programming

    Mask Programmingo It requires a few special steps in the IC-Manufacturing process FieldProgramming

    o It is done by designer or end user in the field via low cost programming units.

    Programmable Logic Arrayo PLA is universal function generates capable of realizing a set of logic functionsthat

    depend on some maximum number of variables. It consist of an array of ANDgates

    which realize a set of product terms & a set of OR gates which forms variouslogical sum of the product terms.

    Programmable Array Logico PAL circuits have an AND plane that is programmable but on OR plane with

    fixedconnections designed to link each output line to a fixed set.

    o PAL output can realize only a two level expression containing a most 8 terms. A

    PAL advantages are easy of use in some applications as well as higher speed

    because output fan-out is restricted.

    UNIT II

    1. Illustrate Booth Algorithm with an example.Observation: If besides addition we also use subtraction, we can reduce the number

    of consecutives additions and therefore we can make the multiplication faster.This requires to recode the multiplier in such a way that the number ofconsecutive 1s in the multiplier(indeed the number of consecutive additions we

    should have done) are reduced.The key to Booths algorithm is to scan the multiplier and classify group of bits

    into the beginning, the middle and the end of a run of 1s

    2. Design a 4-bit Carry-Look ahead Adder and explain its operation with an

    example.

    i i i i i i ii i i i

    c x y x c y c

    s x y c

    1

  • 8/10/2019 EC2303-com HW and arch.pdf

    9/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    i i i i i i c x y (x y )c 1 i i i

    i i i

    i i i i

    P x y

    G x y

    c G Pc

    1i i i i c GPc1

    111i i i i i i i c G PG PP c1 1 1c G PG PP G ii i ii iRipple-Carry Adder

    2n gate delays = 8 gate delaysCarry-Lookahead Adder4 gate delays16-bit Carry-Lookahead Adder

    12 1 1 0 1 0 0 PP PG PP P c i i i i iPropagation Delay (32-bit)

    Ripple-Carry Adder2n gate delays = 32 gate delaysCascading (4) 4-bit Carry10 gate delaysCarry-Lookahead Adder (w/higher level fns)8 gate delaysRipple-Carry Adder(16 bits)2n gate delays = 64 gate delaysCascading (8) 4-bit Carry18 gate delaysCarry-Lookahead Adder (w/higher level fns)10 gate delays

    Carry-Lookahead AddersCarry-Lookahead Adders

    3. With a neat block diagram explain in detail about the coprocessor.Complicated arithmetic operations like exponential and trigonometricfunctions are closely to implement in CPU hardware, while software

    implementation of this operation are slow .

  • 8/10/2019 EC2303-com HW and arch.pdf

    10/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    To overcome this problem, the alternate method is design auxillary unit thperform complex arithmetic operation called arithmetic coprocessors or

    simply coprocessor, provide fast and low cost hardware.

    In general, it is a separate instruction set processor. Coprocessor is closelycoupled to CPU

    Both CPU and Coprocessor execThe instructions intended for the coprocessor are fetched by CPU, jointlydecoded by CPU and the Coprocessor, executed by the coprocessor in a

    manner that is transparent to the programmer.

    A coprocessor requirecoprocessor and to handle the instructions that are executed by thecoprocessor.

    subtractor and explain its functions.execute the instructions from the same program.

    Connection between CPU and Coprocessoro The coprocessor is connected to the CPU by several control lines that allow

    the activites of the two processes to be co-ordinated.

    o In this CPU-coprocessor interface, CPU act as MASTER and Coprocessor is

    a SLAVE device to the CPU.o When Coprocessor instructions are encountered, the communication

    between CPU and Coprocessor to initiate and terminate execution ofcoprocessor instructions occurs automatically.

    o Thus the coprocessor approach makes it possible either hardware or

    software support without altering the source code of the program beingexecuted .

    A coprocessor instruction typically consist of three fields:1. Code/Opcode-(Fo) distinguishes coprocessor instruction from other CPU

    instruction.2. Address(F1) indicates the address of particular coprocessor when several

    coprocessorare used in the system and finally.

    3. Operand(F2) decides the operation to be executed by the coprocessor.

  • 8/10/2019 EC2303-com HW and arch.pdf

    11/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    4. Write the algorithm for division of floating point numbers and illustrate

    with an

    Example.

  • 8/10/2019 EC2303-com HW and arch.pdf

    12/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

  • 8/10/2019 EC2303-com HW and arch.pdf

    13/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    UNIT III

    1. (a). Write short notes on Nano Programming.

    Nanoprogramming

    Use a 2-level control storage organization

    Top level is a vertical format memory Output of the top level memory drives the address register of the bottom

    (nano- level) memory Nanomemory uses the horizontal format

    Produces the actual control signaloutputs

    The advantage to this approach is significant saving in control memory size (bits) Disadvantage is more complexity and slower operation (doing 2 memory

    accesses fro each microinstruction)Nano programmed machineExample: Supppose that a system is being designed with 200 control pointsand 2048 microinstructions

    Assume that only 256 different combinations of control points are ever used A single-level control memory would require 2048x200=409,600 storage bits

    A nano programmed system would use Microstore of size 2048x8=16k

    Nanostore of size 256x200=51200

    Total size = 67,584 storage bits

    Nano programming has been used in many CISC microprocessorsApplications of MicroprogrammingMicroprogramming application: emulation The use of a microprogram on one machine to execute programs originallywritten to run on another (different!) machine

    By changing the microcode of a machine, you can make it execute softwarefrom another machine

    Commonly used in the past to permit new machines to continue to run oldsoftware

    VAX11-780 had 2 modes(b). Describe the characteristics of super scalar processing.Today, a computer designer is usually faced with maintaining binary compatibility,

    i.e.,maintaining instruction set compatibility and a sequential execution model

    (which typically implies precise interrupts1). For high performance, however,superscalar processor

  • 8/10/2019 EC2303-com HW and arch.pdf

    14/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    implementations deviate radically from sequential execution -- much has to bedone in parallel. As a result, the program binary nowadays should viewed as a

    specification of what has to be done, not how it is done in reality. A modernsuperscalar microprocessor takes the sequential specification as embodied in the

    program binary and removes much of the nonessential sequentiality to turn theprogram into a parallel, higher-performance version, yet the processor retains the

    outward appearance of sequential execution.

    1.3. Elements of High Performance ProcessingSimply stated, achieving higher performance means processing a given program in

    a smaller amount of time. Each individual instruction takes some time to fetch andexecute; this time is the instructions latency. To reduce the time to execute a

    sequence of instructions (e.g. a program), one can: (i) reduce individual instructionlatencies, or (ii) execute more instructions in parallel. Because superscalar

    processor implementations are distinguished by the latter (while adequate attentionis also paid to the former), we willconcentrate on the latter method in this paper.

    Nevertheless, a significant challenge in superscalar design is to not increase

    instruction latencies due to increased hardware complexity brought about by the

    drive for enhanced parallelism. Parallel instruction processing requires: thedetermination of the dependence relationships between instructions, adequate

    hardware resources to execute multiple operations in parallel, strategies todetermine when an operation is ready for execution, and techniques to pass values

    from one operation to another. When the effects of instructions are committed, and

    the visible state of the machine updated, the appearance of sequential executionmust be maintained. More precisely, in hardware terms, this means a superscalar

    processor implements:i) Instruction fetch strategies that simultaneously fetch multiple instructions, often

    by predicting the outcomes of, and fetching beyond, conditional branchinstructions,

    ii) Methods for determining true dependences involving register values, andmechanisms for communicating these values to where they are needed during

    execution,iii) Methods for initiating, or issuing, multiple instructions in parallel,

    iv) Resources for parallel execution of many instructions, including multiplepipelined functional units and memory hierarchies capable of simultaneously

    servicing multiple memory references,

    v) Methods for communicating data values through memory via load and store

    instructions, and memory interfaces that allow for the dynamic and oftenunpredictable performance behavior of memory hierarchies. These interfaces must

    be well-matched with the instruction execution strategies.

  • 8/10/2019 EC2303-com HW and arch.pdf

    15/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    vi) Methods for committing the process state in correct order; these mechanismsmaintain an outward appearance of sequential execution. Although we will discuss

    the above items separately, in reality they cannot be completelyseparated nor should they be. In good superscalar designs they are often

    integrated in a cohesive, almost seamless, manner.

    2. Discuss the various hazards that might arise in a pipeline. What are the

    remedies commonly adopted to overcome/minimize these hazards.

    The Pipeline Defined

    Pipelining

    John Hayes provides a definition of a pipeline as it applies to a computerprocessor.

    "A pipeline processor consists of a sequence of processing circuits, calledsegments or stages, through which a stream of operands can be passed.

    "Partial processing of the operands takes place in each segment."... a fully processed result is obtained only after an operand set has passed through

    the entire pipeline."

    In everyday life, people do many tasks in stages. For instance, when we do the

    laundry, we place a load in the washing machine. When it is done, it is transferredto the dryer and another load is placed in the washing machine. When the first load

    is dry, we pull it out for folding or ironing, moving the second load to the dryer andstart a third load in the washing machine. We proceed with folding or ironing of

    the first load while the second and third loads are being dried and washed,

    respectively. We may have never thought of it this way but we do laundry bypipeline processing.

    A Pipeline is a series of stages, where some work is done at each stage. The workis not finished until it has passed through all stages.

    Let us review Hayes' definition as it pertains to our laundry example. The washingmachine is one "sequence of processing circuits" or a stage. The second is the

    dryer. The third is the folding or ironing stage."A significant aspect of our civilization is the division of labor. Major engineering

    achievements are based o n subdividing the total work into individual tasks whichcan be handled despite their inter-dependencies.

    "Overlap and pipelining are essentially operation management techniquesbased on job

    sub-divisions under a precedence constraint."

  • 8/10/2019 EC2303-com HW and arch.pdf

    16/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    Types of Pipelines

    Instructional pipeline:where different stages of an instruction fetch and execution are handled in a

    pipeline.

    Arithmetic pipeline:where different stages of an arithmetic operation are handled along the stages of apipeline.

    The above definitions are correct but are based on a narrow perspective, consideronly the central processor. There are other type of computing pipelines. Pipelines

    are used to compress and transfer video data. Another is the use of specializedhardware to perform graphics display tasks. Discussing graphics displays, Ware

    Myers wrote:"...the pipeline concept ... transforms a model of some object into representations

    that successively become more machine-dependent and finally results in an imageupon a particular screen.

    This example of pipelining fits the definitions from Hayes and Chen but not the

    categories offered by Tabaz. These broader categories are beyond the scope of this

    paper and are mentioned only to alert the reader that different authors meandifferent things when referring to pipelining.

    DisadvanatgesThere are two disadvantages of pipeline architecture. The first is complexity. The

    second is the inability to continuously run the pipeline at full speed, i.e. the

    pipeline stalls.Let us examine why the pipeline cannot run at full speed. There are phenomena

    called pipeline hazards which disrupt the smooth execution of the pipeline. Theresulting delays in the pipeline flow are called bubbles. These pipeline hazards

    include

    structural hazards from hardware conflictsdata hazards arising from data dependencies control hazards that come above from branch, jump, and other control flowchangesThese issues can and are successfully dealt with. But detecting and avoiding

    the hazards leads to a considerable increase in hardware complexity. The controlpaths controlling the gating between stages can contain more circuit levels than the

    data paths being controlled. In 1970, this complexity is one reason that led Foster

    to call pipelining

    "still-controversial" .1. Instruction fetch

    2. Instruction decode and register fetch

  • 8/10/2019 EC2303-com HW and arch.pdf

    17/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    3. Execute4. Memory access

    5. Register write backHazards: When a programmer (or compiler) writes assembly code, they make the

    assumption that each instruction is executed before execution of the subsequentinstruction is begun. This assumption is invalidated by pipelining. When this

    causes a program to behave incorrectly, the situation is known as a hazard. Varioustechniques for resolving hazards such as forwarding and stalling exist.

    Hazard (computer architecture)In computer architecture, a hazard is a potential problem that can happen in a

    pipelined processor. It refers to the possibility of erroneous computation when aCPU tries to simultaneously execute multiple instructions which exhibit data

    dependence. There are typically three types of hazards: data hazards, structuralhazards, and branching hazards (control hazards).

    Instructions in a pipelined processor are performed in several stages, so that at anygiven time several instructions are being executed, and instructions may not be

    completed in the desired order.

    A hazard occurs when two or more of these simultaneous (possibly out of order)

    instructions conflict.

    1 Data hazardso 1.1 RAW - Read After Writeo 1.2 WAR - Write After Read

    o 1.3 WAW - Write After Write

    2 Structural hazards3 Branch (control) hazards4 Eliminating hazardso 4.1 Eliminating data hazards

    o 5.1 Eliminating branch hazards

    Data hazards

    A major effect of pipelining is to change the relative timing of instructions byoverlapping their execution.

    This introduces data and control hazards. Data hazards occur when the pipelinechanges the order of read/write accesses to operands so that the order differs from

    the order seen by sequentially executing instructions on the unpipelined machine.Consider the pipelined execution of these instructions:

    1 2 3 4 5 6 7 8 9

    AD

    DR1, R2, R3 I

    F

  • 8/10/2019 EC2303-com HW and arch.pdf

    18/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    ID

    EX MEM

    WBSU

    BR4, R5, R1 I

    F

    IDs

    ubEX ME

    MWB

    AND

    R6, R1, R7 IF IDan

    dEX MEM

    WBOR R8, R1, R9 IF IDor EX ME

    M

    WBXO

    RR10,R1,R11 IF IDxo

    rEX ME

    MW

    BAll the instructions after the ADD use the result of the ADD instruction (in R1).

    The ADD instruction writes the value of R1 in the WB stage (shown black), andthe SUB instruction reads the value during ID stage (IDsub). This problem is

    called a data hazard. Unless precautions are taken to prevent it, the SUB

    instruction will read the wrong value and try to use it.

    The AND instruction is also affected by this data hazard. The write of R1 does notcomplete until the end of

  • 8/10/2019 EC2303-com HW and arch.pdf

    19/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    cycle 5 (shown black). Thus, the AND instruction that reads the registers duringcycle 4 (IDand) will receive the wrong result.

    The OR instruction can be made to operate without incurring a hazard by a simpleimplementation technique.

    The technique is to perform register file reads in the second half of the cycle, andwrites in the first half.

    Because both WB for ADD and IDor for OR are performed in one cycle 5, thewrite to register file by ADD

    will perform in the first half of the cycle, and the read of registers by OR willperform in the second half ofthe cycle.

    The XOR instruction operates properly, because its register read occur in cycle 6after the register write by

    ADD.The next page discusses forwarding, a technique to eliminate the stalls for the

    hazard involving the SUB and AND instructions.We will also classify the data hazards and consider the cases when stalls can not be

    eliminated. We will see what compiler can do to schedule the pipeline to avoid

    stalls.

    A hazard is created whenever there is a dependence between instructions, and theyare close enough that the overlap caused by pipelining would change the order of

    access to an operand. Our example hazards have all been with register operands,but it is also possible to create a dependence by writing and reading the same

    memory location. In DLX pipeline, however, memory references are always kept

    in order, preventing this type of hazard from arising.All the data hazards discussed here involve registers within the CPU. By

    convention,the hazards are named by the ordering in the program that must be

    preserved by the pipeline.

    RAW (read after write)

    WAW

    WAR (write after read)

    (write after write)Consider two instructions i and j, with i occurring before j. The possible data

    hazards are:

    RAW (read after write) -j tries to read a source before i writes it, soj incorrectly

    gets the old value.

    This is the most common type of hazard and the kind that we use forwarding to

    overcome.

    WAW (write after write) -j tries to write an operand before it is written by i. The

    writes end up being

  • 8/10/2019 EC2303-com HW and arch.pdf

    20/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    performed in the wrong order, leaving the value written by i rather than the value

    written byj in the destination.

    This hazard is present only in pipelines that write in more than one pipe stage orallow an instruction to proceed even when a previous instruction is stalled. The

    DLX integer pipeline writes a register only in WB and avoids this class of hazards.WAW hazards would be possible if we made the following two changes to the

    DLX pipeline:Here is a sequence of two instructions showing the execution in this revised

    pipeline, highlighting the pipe stage that writes the result:LW R1, 0(R2) IF ID EX MEM1 MEM2 WB

    ADD R1, R2, R3 IF ID EX WBUnless this hazard is avoided, execution of this sequence on this revised pipeline

    will leave the result of the first write (the LW) in R1, rather than the result of theADD.

    Allowing writes in different pipe stages introduces other problems, since twoinstructions can try to write during the same clock cycle. The DLX FP pipeline ,

    which has both writes in different stages and different

    pipeline lengths, will deal with both write conflicts and WAW hazards in detail.

    WAR (write after read) -j tries to write a destination before it is read by i , so iincorrectly gets the new value.

    This can not happen in our example pipeline because all reads are early (in ID) andall writes are late (in WB).

    This hazard occurs when there are some instructions that write results early in the

    instruction pipeline, and other instructions that read a source late in the pipeline.Because of the natural structure of a pipeline, which typically reads values before it

    writes results, such hazards are rare. Pipelines for complex instruction sets thatsupport autoincrement addressing and requireoperands to be read late in the

    pipeline could create a WAR hazards. If we modified the DLX pipeline as in theabove example and also read some operands late, such as the source value for a

    store instruction, a WAR hazard could occur. Here is the pipeline timing for such apotential hazard, highlighting the stage where the conflict occurs:

    SW R1, 0(R2) IF ID EX MEM1 MEM2 WBADD R2, R3, R4 IF ID EX WB

    If the SW reads R2 during the second half of its MEM2 stage and the Add writesR2 during the first half of its WB stage, the SW will incorrectly read and store the

    value produced by the ADD.

    RAR (read after read) - this case is not a hazard :).

    Structural hazardsA structural hazard occurs when a part of the processor's hardware is needed by

    two or more instructions at the same time. A structural hazard might occur, for

  • 8/10/2019 EC2303-com HW and arch.pdf

    21/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    instance, if a program were to execute a branch instruction followed by acomputation instruction.

    Branch (control) hazardsBranching hazards (also known as control hazards) occur when the processor is

    told to branch - i.e., if a certain condition is true, then jump from one part of theinstruction stream to another - not necessarily to the next instruction sequentially.

    In such a case, the processor cannot tell in advance whether it should process thenext instruction (when it may instead have to move to a distant instruction).

    This can result in the processor doing unwanted actions.Acache miss. A cache miss stalls all the instructions on pipeline both before and

    after the instruction causing the miss.Ahazard in pipeline. Eliminating a hazard often requires that some instructions in

    the pipeline to be allowed to proceed while others are delayed. When theinstruction is stalled, all the instructions issued later than the

    stalled instruction are also stalled. Instructions issued earlier than the stalledinstruction must continue, since otherwise the hazard will never clear.

    A hazard causes pipeline bubbles to be inserted.The following table shows how the

    stalls are actually implemented. As a result, no new instructions are fetched during

    clock cycle 4, no instruction will finish during clock cycle 8.In case of structural hazards:

    Instr 1 2 3 4 5 6 7 8 9 10Instr i I

    F

    ID

    EX MEM WBInstr i+1 IF ID EX MEM WB

    Instr i+2 IF ID EX MEM WBStallbubble bubble bubble Bubble Bubble

    Instr i+3 IF ID EX MEM WBInstr i+4 IF ID EX MEM WB

    To simplify the picture it is also commonly shown like this:

    Clock cycle number

    Instr 1 2 3 4 5 6 7 8 9 10Instr i IF ID EX MEM WB

    Instr i+1 IF ID EX MEM WB

    Instr i+2 IF ID EX MEM WB

    Instr i+3 stall IF ID EX MEM WBInstr i+4 IF ID EX MEM WB

    In case of data hazards:

  • 8/10/2019 EC2303-com HW and arch.pdf

    22/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    Clock cycle numberInstr 1 2 3 4 5 6 7 8 9 10

    Instr i IF ID EX MEM WBInstr i+1 IF IDbubble EX MEM WB

    Instr i+2 IFbubble ID EX MEM WBInstr i+3bubble IF ID EX MEM WB

    Instr i+4 IF ID EX MEM WBwhich appears the same with stalls:

    Clock cycle numberInstr 1 2 3 4 5 6 7 8 9 10

    Instr i IF ID EX MEM WBInstr i+1 IF ID stall EX MEM WB

    Instr i+2 IF stall ID EX MEM WBInstr i+3 stall IF ID EX MEM WB

    Instr i+4 IF ID EX MEM WB

    UNIT IV

    1.What do you mean by memory hierarchy ? Briefly discuss.

    Memory is technically any form of electronic storage. Personal computer systemhave a hierarchical memory structure consisting of auxiliary memory (disks), main

    memory (DRAM) and cache memory (SRAM). A design objective of computer

    system architects is to have the memory hierarchy work as through it were entirelycomprised of the fastest memory type in the system.

    2. What is Cache memory?Cache memory: Active portion of program and data are stored in a fast smallmemory, the average memory access time can be reduced, thus reducing the

    execution time of the program. Such a fast small memory is referred to as cachememory. It is placed between the CPU and main memory as shown in figure.

    3. What do you mean by interleaved memory?

    The memory is partitioned into a number of modules connected to a commonmemory address and data buses. A primary module is a memory array together

    with its own addressed data registers. Figure shows a memory unit with four

    modules.

  • 8/10/2019 EC2303-com HW and arch.pdf

    23/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    4. How many memory chips of4128x8) are needed to provide memory

    capacity of 40 x 16.Memory capacity is 4096 x 16Each chip is 128 8

    No. of chips which is 128 x 8 of 4096 x 16 memory capacity

    5. Explain about main memory.Ans. RAM is used as main memory or primary memory in the computer. This

    memory is mainly used by CPU so it is formed as primary memory RAM is alsoreferred as the primary memory of computer. RAM is volatile memory because its

    contents erased up after the electrical power is switched off. ROM also come undercategory of primary memory. ROM is non volatile memory. Its contents will be

    retained even after electrical power is switched off. ROM is read only memory andRAM is read-write memory. Primary memory is the high speed memory. It can be

    accessed immediately and randomly.

    UNIT V

    1. Explain in detail about interrupt handling.INTERRUPT HANDLING

    Handling Interrupts Many situations where the processor should ignore interrupt requests

    Interrupt-disable

    Interrupt-enableTypical scenario

    Device raises interrupt requestProcessor interrupts program being executed

    Processor disables interrupts and acknowledges interruptInterrupt-service routine executed

    Interrupts enabled and program execution resumedAn equivalent circuit for an open-drain bus used to implement a common interrupt-

    request line.Handling Multiple Devices

    Interrupt PriorityDuring execution of interrupt-service routine

    Disable interrupts from devices at the same level priority or lower

    Continue to accept interrupt requests from higher priority devices

    Privileged instructions executed in supervisor modeControlling device requests

    Interrupt-enable

  • 8/10/2019 EC2303-com HW and arch.pdf

    24/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    KEN, DENPolled interrupts:Priority determined by the order in which processor polls the

    devices (polls their status registers)Vectored interrupts:Priority determined by theorder in which processor

    tells of INTA:If device has not requested service, passes the INTA signal todeviceto put its code on the address lines (order of connection in the chain)

    Daisy chaining next Device If needs service, does not pass the INTA, puts its codeon the address lines Polled

    Multiple InterruptsPriority in Processor

    Status WordStatus Register --active program

    Status Word --inactive programChanged only by privileged instruction

    Mode changes --automatic or by privilegedInterrupt enable/disable, by device, system-wide

    Common Functions of Interrupts

    Interrupt transfers control to the interrupt service routine, generally through the

    interrupt vector table, which contains the addresses of all the service routines.Interrupt architecture must save the address of the interrupted instruction and the

    contents of the processor status register.Incoming interrupts are disabledwhile another interrupt is being processed to

    prevent a lost interrupt.

    A software-generated interrupt may be caused either by an error or a user request(sometimes called a trap).

    An operating system is interruptdriven.Hardware interruptsfrom I/O devices, memory, processor, Software interrupts

    Generatedby a program.

    2. Explain in detail about standard I/O interface.From the discussions so far, the reader must have understood that the input/output

    system for a computer are accommodated in many layers, like memory devices.We have already discussed about cache memory and a special high speed bus to

    communicate with it, designated as cache bus. Main memory of the system is alsointerfaced with the processor with a dedicated memory bus so that delay in I/O

    operations, which is quite normal and happens frequently, does not retard the

    instruction that has to be carried out.

    The innermost layer of I/O devices is directly interfaced with the processor throughits address, data and control bus (designated as I/O bus), and communicates in

    synchronous manner (synchronous parallel communication).

  • 8/10/2019 EC2303-com HW and arch.pdf

    25/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    Note that although with processor these devices communicate in synchronousfashion, with the external world they communicate in asynchronous manner.

    Example of I/O devices of this layer may be 8255-based ports, timers/counters forreal-time operations, USART for serial communication and other similar devices.

    Generic Model of IO ModuleCPU checks I/O module device status

    I/O module returns statusIf ready, CPU requests data transfer

    I/O module gets data from deviceI/O module transfers data to CPU

    3. Describe the functions of SCSI with a neat diagram.SCSI Bus Defined by ANSI X3.131

    Small Computer System Interface 50, 68 or 80 pins

    Max. transfer rate 160 MB/s, 320 MB/s.

    4. Discuss the DMA driven data transfer technique.Polling or interrupt driven I/O incurs considerable overhead

    Multiple program instructionsSaving program state

    Incrementing memory addresses

    Keeping track of word countTransfer large amounts of data at high speed without continuous intervention by

    the processorSpecial control circuit required in the I/O device interface, called a DMA

    controllerDMA Controller

    Part of the I/O device interfaceDMA Channels

    Performs functions that would normally be carried out by the processorProvides memory address

    Bus signals that control transferKeeps track of number of transfers

    Under control of the processor

  • 8/10/2019 EC2303-com HW and arch.pdf

    26/26

    DHANALAKSHMI SRINIVASAN INSTITUTE OF RESEARCH AND TECHNOLOGY,SIRUVACHUR.

    5. Describe Bus Arbitration.

    In a single bus architecture when more than one device requests the bus, acontroller called bus arbiter decides who gets the bus, this is called the bus

    arbitration. In computing, bus mastering is a feature supported by many bus architectures

    that enables a device connected to the bus to initiate transactions. The procedure in bus communication that chooses between connected devices

    contending for control of the shared bus; the device currently in control of the busis often termed the bus master. Devices may be allocated differing priority levels

    that will determine the choice of bus master in case of contention. A device not currently bus master must request control of the bus before

    attempting to initiate a data transfer via the bus. The normal protocol is that only one device may be bus master at any time and

    that all other devices act as slaves to this master. Only a bus master may initiate a normal data transfer on the bus; slave devices

    respond to commands issued by the current bus master by supplying data requested

    or accepting data sent.

    Centralized arbitration Distributed arbitration All devices have equal responsibility in carrying out the arbitration process. Each device on the bus assigned an identification number.

    Place their ID numbers on four open-collector lines.