ch 13 95 [read-only] [compatibility mode]
DESCRIPTION
comp modeTRANSCRIPT
-
William Stallings William Stallings Computer Organization
d A hit tand Architecture
Chapter 13Instruction Level ParallelismInstruction Level Parallelismand Superscalar Processors
-
What is Superscalar?What is Superscalar?
aCommon instructions (arithmetic, load/store, conditional branch) can be initiated and
t d i d d tlexecuted independentlyaEqually applicable to RISC & CISCaI ti ll RISCaIn practice usually RISC
-
Why Superscalar?Why Superscalar?
aMost operations are on scalar quantities (see RISC notes)aImprove these operations to get an overall
improvement
-
General Superscalar OrganizationOrganization
-
SuperpipelinedSuperpipelined
aMany pipeline stages need less than half a clock cycleaDouble internal clock speed gets two tasks per
external clock cycleaS l ll ll l f t h taSuperscalar allows parallel fetch execute
-
Superscalar vSuperpipelineSuperpipeline
-
LimitationsLimitationsaInstruction level parallelismaInstruction level parallelismaCompiler based optimisationaHardware techniquesaHardware techniquesaLimited by`True data dependency`True data dependency`Procedural dependency`Resource conflicts`Output dependency`Antidependency
-
True Data DependencyTrue Data Dependency
aADD r1, r2 (r1 := r1+r2;)aMOVE r3,r1 (r3 := r1;)aCan fetch and decode second instruction in
parallel with firstaCan NOT execute second instruction until first is
finished
-
Procedural DependencyProcedural Dependency
aCan not execute instructions after a branch in parallel with instructions before a branchaAlso, if instruction length is not fixed,
instructions have to be decoded to find out how many fetches are neededmany fetches are neededaThis prevents simultaneous fetches
-
Resource ConflictResource Conflict
aTwo or more instructions requiring access to the same resource at the same time
h`e.g. two arithmetic instructionsaCan duplicate resources` h t ith ti it`e.g. have two arithmetic units
-
D d iDependencies
-
Design IssuesDesign Issues
aInstruction level parallelism`Instructions in a sequence are independent`Execution can be overlapped`Governed by data and procedural dependency
aMachine Pa allelismaMachine Parallelism`Ability to take advantage of instruction level
parallelismparallelism`Governed by number of parallel pipelines
-
Instruction Issue PolicyInstruction Issue Policy
aOrder in which instructions are fetchedaOrder in which instructions are executedaOrder in which instructions change registers and
memory
-
In-Order Issue In Order CompletionIn-Order Completion
aIssue instructions in the order they occuraNot very efficientaMay fetch >1 instructionaInstructions must stall if necessary
-
In-Order Issue In-Order Completion (Diagram)Completion (Diagram)
-
In-Order Issue Out of Order CompletionOut-of-Order Completion
aOutput dependency`R3:= R3 + R5; (I1)`R4:= R3 + 1; (I2)`R3:= R5 + 1; (I3)`I2 depends on result of I1 data dependency`I2 depends on result of I1 - data dependency`If I3 completes before I1, the result from I1 will be
wrong - output (read-write) dependencyg p ( ) p y
-
In-Order Issue Out-of-Order Completion (Diagram)Completion (Diagram)
-
Out-of-Order IssueOut of Order CompletionOut-of-Order Completion
aDecouple decode pipeline from execution pipelineaCan continue to fetch and decode until this
pipeline is fullaWh f ti l it b il blaWhen a functional unit becomes available an
instruction can be executedaSi i t ti h b d d daSince instructions have been decoded, processor
can look ahead
-
Out-of-Order Issue Out-of-Order Completion (Diagram)Completion (Diagram)
-
AntidependencyAntidependency
aWrite-write dependency`R3:=R3 + R5; (I1)`R4:=R3 + 1; (I2)`R3:=R5 + 1; (I3)`R7:=R3 + R4; (I4)`R7:=R3 + R4; (I4)`I3 can not complete before I2 starts as I2 needs a
value in R3 and I3 changes R3g
-
Register RenamingRegister Renaming
aOutput and antidependencies occur because register contents may not reflect the correct
d i f thordering from the programaMay result in a pipeline stallaR i t ll t d d i llaRegisters allocated dynamically`i.e. registers are not specifically named
-
Register Renaming exampleRegister Renaming exampleaR3b:=R3a + R5a (I1)aR3b: R3a + R5a (I1)aR4b:=R3b + 1 (I2)aR3c:=R5a + 1 (I3)aR3c:=R5a + 1 (I3)aR7b:=R3c + R4b (I4)aWithout subscript refers to logical register inaWithout subscript refers to logical register in
instructionaWith subscript is hardware register allocatedaWith subscript is hardware register allocatedaNote R3a R3b R3c
-
Machine ParallelismMachine Parallelism
aDuplication of ResourcesaOut of order issueaRenamingaNot worth duplication functions without register
renamingaNeed instruction window large enough (more
than 8)
-
Branch PredictionBranch Prediction
a80486 fetches both next sequential instruction after branch and branch target instructionaGives two cycle delay if branch taken
-
RISC Delayed BranchRISC - Delayed Branch
aCalculate result of branch before unusable instructions pre-fetchedaAlways execute single instruction immediately
following branchaK i li f ll hil f t hi i t tiaKeeps pipeline full while fetching new instruction
streamaN t d f laNot as good for superscalar`Multiple instructions need to execute in delay slot`Instruction dependence problems`Instruction dependence problems
aRevert to branch prediction
-
Superscalar ExecutionSuperscalar Execution
-
Superscalar ImplementationSuperscalar Implementation
aSimultaneously fetch multiple instructionsaLogic to determine true dependencies involving
register valuesaMechanisms to communicate these valuesaMechanisms to initiate multiple instructions in
parallelaResources for parallel execution of multiple
instructionsaM h i f i i iaMechanisms for committing process state in
correct order
-
Required ReadingRequired Reading
aStallings chapter 13aManufacturers web sitesaIMPACT web site`research on predicated execution