ch 13 95 [read-only] [compatibility mode]

28
William Stallings William Stallings Computer Organization d A hit t and Architecture Chapter 13 Instruction Level Parallelism Instruction Level Parallelism and Superscalar Processors

Upload: fachmiuun

Post on 18-Dec-2015

220 views

Category:

Documents


3 download

DESCRIPTION

comp mode

TRANSCRIPT

  • William Stallings William Stallings Computer Organization

    d A hit tand Architecture

    Chapter 13Instruction Level ParallelismInstruction Level Parallelismand Superscalar Processors

  • What is Superscalar?What is Superscalar?

    aCommon instructions (arithmetic, load/store, conditional branch) can be initiated and

    t d i d d tlexecuted independentlyaEqually applicable to RISC & CISCaI ti ll RISCaIn practice usually RISC

  • Why Superscalar?Why Superscalar?

    aMost operations are on scalar quantities (see RISC notes)aImprove these operations to get an overall

    improvement

  • General Superscalar OrganizationOrganization

  • SuperpipelinedSuperpipelined

    aMany pipeline stages need less than half a clock cycleaDouble internal clock speed gets two tasks per

    external clock cycleaS l ll ll l f t h taSuperscalar allows parallel fetch execute

  • Superscalar vSuperpipelineSuperpipeline

  • LimitationsLimitationsaInstruction level parallelismaInstruction level parallelismaCompiler based optimisationaHardware techniquesaHardware techniquesaLimited by`True data dependency`True data dependency`Procedural dependency`Resource conflicts`Output dependency`Antidependency

  • True Data DependencyTrue Data Dependency

    aADD r1, r2 (r1 := r1+r2;)aMOVE r3,r1 (r3 := r1;)aCan fetch and decode second instruction in

    parallel with firstaCan NOT execute second instruction until first is

    finished

  • Procedural DependencyProcedural Dependency

    aCan not execute instructions after a branch in parallel with instructions before a branchaAlso, if instruction length is not fixed,

    instructions have to be decoded to find out how many fetches are neededmany fetches are neededaThis prevents simultaneous fetches

  • Resource ConflictResource Conflict

    aTwo or more instructions requiring access to the same resource at the same time

    h`e.g. two arithmetic instructionsaCan duplicate resources` h t ith ti it`e.g. have two arithmetic units

  • D d iDependencies

  • Design IssuesDesign Issues

    aInstruction level parallelism`Instructions in a sequence are independent`Execution can be overlapped`Governed by data and procedural dependency

    aMachine Pa allelismaMachine Parallelism`Ability to take advantage of instruction level

    parallelismparallelism`Governed by number of parallel pipelines

  • Instruction Issue PolicyInstruction Issue Policy

    aOrder in which instructions are fetchedaOrder in which instructions are executedaOrder in which instructions change registers and

    memory

  • In-Order Issue In Order CompletionIn-Order Completion

    aIssue instructions in the order they occuraNot very efficientaMay fetch >1 instructionaInstructions must stall if necessary

  • In-Order Issue In-Order Completion (Diagram)Completion (Diagram)

  • In-Order Issue Out of Order CompletionOut-of-Order Completion

    aOutput dependency`R3:= R3 + R5; (I1)`R4:= R3 + 1; (I2)`R3:= R5 + 1; (I3)`I2 depends on result of I1 data dependency`I2 depends on result of I1 - data dependency`If I3 completes before I1, the result from I1 will be

    wrong - output (read-write) dependencyg p ( ) p y

  • In-Order Issue Out-of-Order Completion (Diagram)Completion (Diagram)

  • Out-of-Order IssueOut of Order CompletionOut-of-Order Completion

    aDecouple decode pipeline from execution pipelineaCan continue to fetch and decode until this

    pipeline is fullaWh f ti l it b il blaWhen a functional unit becomes available an

    instruction can be executedaSi i t ti h b d d daSince instructions have been decoded, processor

    can look ahead

  • Out-of-Order Issue Out-of-Order Completion (Diagram)Completion (Diagram)

  • AntidependencyAntidependency

    aWrite-write dependency`R3:=R3 + R5; (I1)`R4:=R3 + 1; (I2)`R3:=R5 + 1; (I3)`R7:=R3 + R4; (I4)`R7:=R3 + R4; (I4)`I3 can not complete before I2 starts as I2 needs a

    value in R3 and I3 changes R3g

  • Register RenamingRegister Renaming

    aOutput and antidependencies occur because register contents may not reflect the correct

    d i f thordering from the programaMay result in a pipeline stallaR i t ll t d d i llaRegisters allocated dynamically`i.e. registers are not specifically named

  • Register Renaming exampleRegister Renaming exampleaR3b:=R3a + R5a (I1)aR3b: R3a + R5a (I1)aR4b:=R3b + 1 (I2)aR3c:=R5a + 1 (I3)aR3c:=R5a + 1 (I3)aR7b:=R3c + R4b (I4)aWithout subscript refers to logical register inaWithout subscript refers to logical register in

    instructionaWith subscript is hardware register allocatedaWith subscript is hardware register allocatedaNote R3a R3b R3c

  • Machine ParallelismMachine Parallelism

    aDuplication of ResourcesaOut of order issueaRenamingaNot worth duplication functions without register

    renamingaNeed instruction window large enough (more

    than 8)

  • Branch PredictionBranch Prediction

    a80486 fetches both next sequential instruction after branch and branch target instructionaGives two cycle delay if branch taken

  • RISC Delayed BranchRISC - Delayed Branch

    aCalculate result of branch before unusable instructions pre-fetchedaAlways execute single instruction immediately

    following branchaK i li f ll hil f t hi i t tiaKeeps pipeline full while fetching new instruction

    streamaN t d f laNot as good for superscalar`Multiple instructions need to execute in delay slot`Instruction dependence problems`Instruction dependence problems

    aRevert to branch prediction

  • Superscalar ExecutionSuperscalar Execution

  • Superscalar ImplementationSuperscalar Implementation

    aSimultaneously fetch multiple instructionsaLogic to determine true dependencies involving

    register valuesaMechanisms to communicate these valuesaMechanisms to initiate multiple instructions in

    parallelaResources for parallel execution of multiple

    instructionsaM h i f i i iaMechanisms for committing process state in

    correct order

  • Required ReadingRequired Reading

    aStallings chapter 13aManufacturers web sitesaIMPACT web site`research on predicated execution