lecture 9 instruction level parallelism topics review appendix c dynamic scheduling scoreboarding...
TRANSCRIPT
Lecture 9Instruction Level Parallelism
Lecture 9Instruction Level Parallelism
Topics Topics Review Appendix C Dynamic Scheduling
ScoreboardingTomasulo
Readings: Chapter 3Readings: Chapter 3
October 8, 2014
CSCE 513 Computer Architecture
– 2 – CSCE 513 Fall 2015
OverviewOverviewLast TimeLast Time
Dynamic Scheduling MIPS R4000 Pipeline/ Scoreboard
NewNew Stalls in Diagrams revisited Scoreboard Review Dealing with Control Hazards in the 5-stage
ReferencesReferences Appendix A.8, Chapter 3
– 3 – CSCE 513 Fall 2015
Figure 3.1 ILP techniques referencesFigure 3.1 ILP techniques references
Review: Review:
– 6 – CSCE 513 Fall 2015
Figure C.28 - Branch hazardsFigure C.28 - Branch hazards
Figure C.28 The stall from branch hazards can be reduced by moving the zero test and branch-target calculation into the ID phase of the pipeline. Notice that we
have made two important changes, each of which removes 1 cycle from the 3-cycle stall for branches. The first change is to move both the branch-target address
calculation and the branch condition decision to the ID cycle. The second change is to write the PC of the instruction in the IF phase, using either the branch-
target address computed during ID or the incremented PC computed during IF. In comparison, Figure C.22 obtained the branch-target address from the
EX/MEM register and wrote the result during the MEM clock cycle. As mentioned in Figure C.22, the PC can be thought of as a pipeline register (e.g., as part of
ID/IF), which is written with the address of the next instruction at the end of each IF cycle.
– 7 – CSCE 513 Fall 2015
Fig C.29 Revised IF, IDFig C.29 Revised IF, ID
Pipeline Fetch and Decode Stages for Branch InstructionsPipeline Fetch and Decode Stages for Branch Instructions
FetchFetch IF/ID.IR Mem[PC] IF/ID.NPC (if ((IF/ID.opcode == branch) &
(regs[IF/ID.IR[rs] op 0))
{IF?ID.NPC + sign-ext(IF/ID.IR[Imm] << 2)}
else {PC + 4}
DecodeDecode ID/EX.A Regs[IF/ID.IR[rs] ID/EX.B Regs[IF/ID.IR[rt]] ID/EX.IR IF/ID.IR ID/EX.Imm (IF/ID.IR[16]16 ## IF/ID.IR[16-31])
– 9 – CSCE 513 Fall 2015
C. 4 What Makes Pipelining Hard to Implement? Exceptions!C. 4 What Makes Pipelining Hard to Implement? Exceptions! I/ O device request I/ O device request Invoking an operating system service from a user Invoking an operating system service from a user
program program Tracing instruction execution Tracing instruction execution Breakpoint (programmer-requested interrupt) Breakpoint (programmer-requested interrupt) Integer arithmetic overflow Integer arithmetic overflow FP arithmetic anomaly FP arithmetic anomaly Page fault (not in main memory) Page fault (not in main memory) Misaligned memory accesses (if alignment is Misaligned memory accesses (if alignment is
required) required) Memory protection violation Memory protection violation Using an undefined or unimplemented instruction Using an undefined or unimplemented instruction Hardware malfunctionsHardware malfunctions
– 10 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
IntroductionIntroduction
Pipelining became universal technique in 1985Pipelining became universal technique in 1985 Overlaps execution of instructions Exploits “Instruction Level Parallelism”
Beyond this, there are two main approaches:Beyond this, there are two main approaches: Hardware-based dynamic approaches
Used in server and desktop processorsNot used as extensively in PMP processors
Compiler-based static approachesNot as successful outside of scientific applications
Intro
du
ction
– 11 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Instruction-Level ParallelismInstruction-Level Parallelism
When exploiting instruction-level parallelism, When exploiting instruction-level parallelism, goal is to maximize CPIgoal is to maximize CPI Pipeline CPI =
Ideal pipeline CPI +Structural stalls +Data hazard stalls +Control stalls
Parallelism with basic block is limitedParallelism with basic block is limited Typical size of basic block = 3-6 instructions Must optimize across branches
Intro
du
ction
– 12 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Data DependenceData Dependence
Loop-Level ParallelismLoop-Level Parallelism Unroll loop statically or dynamically Use SIMD (vector processors and GPUs)
Challenges:Challenges: Data dependency
Instruction j is data dependent on instruction i if» Instruction i produces a result that may be used by
instruction j» Instruction j is data dependent on instruction k and
instruction k is data dependent on instruction i
Dependent instructions cannot be executed Dependent instructions cannot be executed simultaneouslysimultaneously
Intro
du
ction
– 13 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Data DependenceData DependenceDependencies are a property of programsDependencies are a property of programs
Pipeline organization determines if dependence is Pipeline organization determines if dependence is detected and if it causes a stalldetected and if it causes a stall
Data dependence conveys:Data dependence conveys: Possibility of a hazard Order in which results must be calculated Upper bound on exploitable instruction level
parallelism
Dependencies that flow through memory locations Dependencies that flow through memory locations are difficult to detectare difficult to detect
Intro
du
ction
– 14 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Name DependenceName Dependence
Two instructions use the same name but no Two instructions use the same name but no flow of informationflow of information Not a true data dependence, but is a problem
when reordering instructions Antidependence: instruction j writes a register or
memory location that instruction i readsInitial ordering (i before j) must be preserved
Output dependence: instruction i and instruction j write the same register or memory locationOrdering must be preserved
To resolve, use renaming techniquesTo resolve, use renaming techniques
Intro
du
ction
– 15 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Other FactorsOther Factors
Data HazardsData Hazards Read after write (RAW) Write after write (WAW) Write after read (WAR)
Control DependenceControl Dependence Ordering of instruction i with respect to a branch
instructionInstruction control dependent on a branch cannot be
moved before the branch so that its execution is no longer controller by the branch
An instruction not control dependent on a branch cannot be moved after the branch so that its execution is controlled by the branch
Intro
du
ction
– 16 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
ExamplesExamplesOR instruction dependent OR instruction dependent
on DADDU and DSUBUon DADDU and DSUBU
Assume R4 isn’t used after Assume R4 isn’t used after skipskip Possible to move DSUBU
before the branch
Intro
du
ction
• Example 1:
DADDU R1,R2,R3
BEQZ R4,L
DSUBU R1,R1,R6
L: …
OR R7,R1,R8
•Example 2:
DADDU R1,R2,R3
BEQZ R12,skip
DSUBU R4,R5,R6
DADDU R5,R4,R9
skip:
OR R7,R8,R9
– 17 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Compiler Techniques for Exposing ILPCompiler Techniques for Exposing ILP
Pipeline schedulingPipeline scheduling Separate dependent instruction from the source instruction
by the pipeline latency of the source instruction
Example:Example:for (i=999; i>=0; i=i-1) x[i] = x[i] + s;
Co
mp
iler Tech
niq
ues
– 18 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Pipeline StallsPipeline StallsLoop:Loop: L.DL.D F0,0(R1)F0,0(R1)
stallstall
ADD.D F4,F0,F2ADD.D F4,F0,F2
stallstall
stallstall
S.D F4,0(R1)S.D F4,0(R1)
DADDUI R1,R1,#-8DADDUI R1,R1,#-8
stallstall (assume integer load latency is 1) (assume integer load latency is 1)
BNE R1,R2,LoopBNE R1,R2,Loop
Co
mp
iler Tech
niq
ues
– 19 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Pipeline SchedulingPipeline SchedulingScheduled code:Scheduled code:
Loop:Loop: L.DL.D F0,0(R1)F0,0(R1)
DADDUI R1,R1,#-8DADDUI R1,R1,#-8
ADD.D F4,F0,F2ADD.D F4,F0,F2
stallstall
stallstall
S.D F4,8(R1)S.D F4,8(R1)
BNE R1,R2,LoopBNE R1,R2,Loop
Co
mp
iler Tech
niq
ues
– 20 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Loop UnrollingLoop UnrollingLoop unrolling (not yet scheduled)Loop unrolling (not yet scheduled)
Unroll by a factor of 4 (assume # elements is divisible by 4) Eliminate unnecessary instructions
Loop:Loop: L.D F0,0(R1)L.D F0,0(R1)
ADD.D F4,F0,F2ADD.D F4,F0,F2
S.D F4,0(R1) S.D F4,0(R1) ;drop DADDUI & BNE;drop DADDUI & BNE
L.D F6,-8(R1)L.D F6,-8(R1)
ADD.D F8,F6,F2ADD.D F8,F6,F2
S.D F8,-8(R1) S.D F8,-8(R1) ;drop DADDUI & BNE;drop DADDUI & BNE
L.D F10,-16(R1)L.D F10,-16(R1)
ADD.D F12,F10,F2ADD.D F12,F10,F2
S.D F12,-16(R1) S.D F12,-16(R1) ;drop DADDUI & BNE;drop DADDUI & BNE
L.D F14,-24(R1)L.D F14,-24(R1)
ADD.D F16,F14,F2ADD.D F16,F14,F2
S.D F16,-24(R1)S.D F16,-24(R1)
DADDUI R1,R1,#-32DADDUI R1,R1,#-32
BNE R1,R2,LoopBNE R1,R2,Loop
Co
mp
iler Tech
niq
ues
note: number of live
registers vs. original
loop
– 21 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Loop Unrolling/Pipeline SchedulingLoop Unrolling/Pipeline Scheduling
Pipeline schedule the unrolled loop:Pipeline schedule the unrolled loop:Loop:Loop: L.D F0,0(R1)L.D F0,0(R1)
L.D F6,-8(R1)L.D F6,-8(R1)
L.D F10,-16(R1)L.D F10,-16(R1)
L.D F14,-24(R1)L.D F14,-24(R1)
ADD.D F4,F0,F2ADD.D F4,F0,F2
ADD.D F8,F6,F2ADD.D F8,F6,F2
ADD.D F12,F10,F2ADD.D F12,F10,F2
ADD.D F16,F14,F2ADD.D F16,F14,F2
S.D F4,0(R1)S.D F4,0(R1)
S.D F8,-8(R1)S.D F8,-8(R1)
DADDUI R1,R1,#-32DADDUI R1,R1,#-32
S.D F12,16(R1)S.D F12,16(R1)
S.D F16,8(R1)S.D F16,8(R1)
BNE R1,R2,LoopBNE R1,R2,Loop
Co
mp
iler Tech
niq
ues
– 22 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Strip MiningStrip Mining
Unknown number of loop iterations?Unknown number of loop iterations? Number of iterations = n Goal: make k copies of the loop body Generate pair of loops:
First executes n mod k timesSecond executes n / k times“Strip mining”
Co
mp
iler Tech
niq
ues
– 23 – CSCE 513 Fall 2015
Static Branch PredictionStatic Branch Prediction
Static Branch Prediction in general is a prediction that Static Branch Prediction in general is a prediction that uses information that was gathered before the uses information that was gathered before the execution of the program. execution of the program.
always taken (MIPS-X, Stanford) always taken (MIPS-X, Stanford)
always not taken (Motorola MC88000). always not taken (Motorola MC88000).
Another way is to run the program with a profiler and Another way is to run the program with a profiler and some training input data predict the branch direction some training input data predict the branch direction which was most frequently taken according to profiling. which was most frequently taken according to profiling. This prediction can be indicated by a hint bit in the This prediction can be indicated by a hint bit in the branch opcode. branch opcode.
– 24 – CSCE 513 Fall 2015
Dynamic Branch PredictionDynamic Branch Prediction
Dynamic Branch Prediction or just Branch Prediction Dynamic Branch Prediction or just Branch Prediction uses information about taken or not taken branches uses information about taken or not taken branches gathered at run-time to predict the outcome of a gathered at run-time to predict the outcome of a branchbranch
– 25 – CSCE 513 Fall 2015
2 Bit Branch predictor Fig C.182 Bit Branch predictor Fig C.18
N-bit predictors
– 26 – CSCE 513 Fall 2015
2 Bit Saturating Counter Branch predictor2 Bit Saturating Counter Branch predictor
..
http://en.wikipedia.org/wiki/Branch_predictor
– 27 – CSCE 513 Fall 2015
Branch History Table IndexingBranch History Table Indexing
Instruction addr Instruction addr x x11=0 and x=0 and x00=0 why?=0 why?
xx3131xx3030xx2929xx2828 … x … x12 12 xx11 …11 …xx2 2 0000
PredictionPrediction
What was done last timeWhat was done last time
2-bit 2-bit
n-bit n-bit
prediction
– 28 – CSCE 513 Fall 2015
Correlating Branch PredictorsCorrelating Branch Predictors
If(a == 2)If(a == 2)
a = 0;a = 0;
If(b ==2) If(b ==2)
b = 0;b = 0;
If (a != b) {If (a != b) {
……
– 29 – CSCE 513 Fall 2015
(m, n) predictors(m, n) predictors
m last branches are used to predictm last branches are used to predict
One of 2One of 2mm n-bit predictors n-bit predictors
– 30 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Branch PredictionBranch Prediction
Basic 2-bit predictor:Basic 2-bit predictor: For each branch:
Predict taken or not taken If the prediction is wrong two consecutive times, change
prediction
Correlating predictor:Correlating predictor: Multiple 2-bit predictors for each branch One for each possible combination of outcomes of preceding
n branches
Local predictor:Local predictor: Multiple 2-bit predictors for each branch One for each possible combination of outcomes for the last n
occurrences of this branch
Tournament predictor:Tournament predictor: Combine correlating predictor with local predictor
Bran
ch P
redictio
n
– 31 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Branch Prediction PerformanceBranch Prediction Performance Bran
ch P
redictio
n
Branch predictor performance
– 32 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Dynamic SchedulingDynamic Scheduling
Rearrange order of instructions to reduce stalls while Rearrange order of instructions to reduce stalls while maintaining data flowmaintaining data flow
Advantages:Advantages: Compiler doesn’t need to have knowledge of
microarchitecture Handles cases where dependencies are unknown at compile
time
Disadvantage:Disadvantage: Substantial increase in hardware complexity Complicates exceptions
Bran
ch P
redictio
n
– 33 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Dynamic SchedulingDynamic Scheduling
Dynamic scheduling implies:Dynamic scheduling implies: Out-of-order execution Out-of-order completion
Creates the possibility for WAR and WAW hazardsCreates the possibility for WAR and WAW hazards
Tomasulo’s ApproachTomasulo’s Approach Tracks when operands are available Introduces register renaming in hardware
Minimizes WAW and WAR hazards
Bran
ch P
redictio
n
– 34 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Register RenamingRegister Renaming
Example:Example:
DIV.D F0,F2,F4DIV.D F0,F2,F4
ADD.D ADD.D F6F6,F0,F8,F0,F8
S.D F6,0(R1)S.D F6,0(R1)
SUB.D F8,F10,F14SUB.D F8,F10,F14
MUL.D MUL.D F6F6,F10,F8,F10,F8
+ name dependence with F6+ name dependence with F6
Bran
ch P
redictio
n
antidependence
antidependence
– 35 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Register RenamingRegister Renaming
Example:Example:
DIV.D F0,F2,F4DIV.D F0,F2,F4
ADD.D ADD.D SS,F0,F8,F0,F8
S.D S.D SS,0(R1),0(R1)
SUB.D SUB.D TT,F10,F14,F10,F14
MUL.D MUL.D F6,F6,F10,F10,TT
Now only RAW hazards remain, which can be strictly Now only RAW hazards remain, which can be strictly orderedordered
Bran
ch P
redictio
n
– 36 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Register RenamingRegister Renaming
Register renaming is provided by reservation stations Register renaming is provided by reservation stations (RS)(RS) Contains:
The instructionBuffered operand values (when available)Reservation station number of instruction providing the
operand values RS fetches and buffers an operand as soon as it becomes
available (not necessarily involving register file) Pending instructions designate the RS to which they will
send their output Result values broadcast on a result bus, called the common data bus
(CDB)
Only the last output updates the register file As instructions are issued, the register specifiers are
renamed with the reservation station May be more reservation stations than registers
Bran
ch P
redictio
n
– 37 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Tomasulo’s AlgorithmTomasulo’s Algorithm
Load and store buffersLoad and store buffers Contain data and addresses, act like reservation stations
Top-level design: Top-level design:
Bran
ch P
redictio
n
– 38 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Tomasulo’s AlgorithmTomasulo’s Algorithm
Three Steps:Three Steps: Issue
Get next instruction from FIFO queue If available RS, issue the instruction to the RS with operand
values if available If operand values not available, stall the instruction
ExecuteWhen operand becomes available, store it in any reservation
stations waiting for itWhen all operands are ready, issue the instructionLoads and store maintained in program order through effective
addressNo instruction allowed to initiate execution until all branches
that proceed it in program order have completed Write result
Write result on CDB into reservation stations and store buffers» (Stores must wait until address and value are received)
Bran
ch P
redictio
n
– 39 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
ExampleExample Bran
ch P
redictio
n
– 40 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Hardware-Based SpeculationHardware-Based Speculation
Execute instructions along predicted execution paths Execute instructions along predicted execution paths but only commit the results if prediction was correctbut only commit the results if prediction was correct
Instruction commit: allowing an instruction to update Instruction commit: allowing an instruction to update the register file when instruction is no longer the register file when instruction is no longer speculativespeculative
Need an additional piece of hardware to prevent any Need an additional piece of hardware to prevent any irrevocable action until an instruction commitsirrevocable action until an instruction commits I.e. updating state or taking an execution
Bran
ch P
redictio
n
– 41 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Reorder BufferReorder Buffer
Reorder buffer – holds the result of instruction between Reorder buffer – holds the result of instruction between completion and commitcompletion and commit
Four fields:Four fields: Instruction type: branch/store/register Destination field: register number Value field: output value Ready field: completed execution?
Modify reservation stations:Modify reservation stations: Operand source is now reorder buffer instead of functional
unit
Bran
ch P
redictio
n
– 42 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Reorder BufferReorder Buffer
Register values and memory values are not written until Register values and memory values are not written until an instruction commitsan instruction commits
On misprediction:On misprediction: Speculated entries in ROB are cleared
Exceptions:Exceptions: Not recognized until it is ready to commit
Bran
ch P
redictio
n
– 43 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Multiple Issue and Static SchedulingMultiple Issue and Static Scheduling
To achieve CPI < 1, need to complete multiple To achieve CPI < 1, need to complete multiple instructions per clockinstructions per clock
Solutions:Solutions: Statically scheduled superscalar processors VLIW (very long instruction word) processors dynamically scheduled superscalar processors
Mu
ltiple Issu
e and
Static S
ched
ulin
g
– 44 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Multiple IssueMultiple IssueM
ultip
le Issue an
d S
tatic Sch
edu
ling
– 45 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
VLIW ProcessorsVLIW Processors
Package multiple operations into one instructionPackage multiple operations into one instruction
Example VLIW processor:Example VLIW processor: One integer instruction (or branch) Two independent floating-point operations Two independent memory references
Must be enough parallelism in code to fill the available Must be enough parallelism in code to fill the available slotsslots
Mu
ltiple Issu
e and
Static S
ched
ulin
g
– 46 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
VLIW ProcessorsVLIW Processors
Disadvantages:Disadvantages: Statically finding parallelism Code size No hazard detection hardware Binary code compatibility
Mu
ltiple Issu
e and
Static S
ched
ulin
g
– 47 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Dynamic Scheduling, Multiple Issue, and SpeculationDynamic Scheduling, Multiple Issue, and Speculation
Modern microarchitectures:Modern microarchitectures: Dynamic scheduling + multiple issue + speculation
Two approaches:Two approaches: Assign reservation stations and update pipeline control
table in half clock cyclesOnly supports 2 instructions/clock
Design logic to handle any possible dependencies between the instructions
Hybrid approaches
Issue logic can become bottleneckIssue logic can become bottleneck
Dyn
amic S
ched
ulin
g, M
ultip
le Issue, an
d S
pecu
lation
– 48 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Dyn
amic S
ched
ulin
g, M
ultip
le Issue, an
d S
pecu
lation
Overview of DesignOverview of Design
– 49 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Limit the number of instructions of a given class that Limit the number of instructions of a given class that can be issued in a “bundle”can be issued in a “bundle” I.e. on FP, one integer, one load, one store
Examine all the dependencies amoung the instructions Examine all the dependencies amoung the instructions in the bundlein the bundle
If dependencies exist in bundle, encode them in If dependencies exist in bundle, encode them in reservation stationsreservation stations
Also need multiple completion/commitAlso need multiple completion/commit
Dyn
amic S
ched
ulin
g, M
ultip
le Issue, an
d S
pecu
lation
Multiple IssueMultiple Issue
– 50 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Loop:Loop: LD R2,0(R1)LD R2,0(R1) ;R2=array element;R2=array element
DADDIU R2,R2,#1DADDIU R2,R2,#1 ;increment R2;increment R2
SD R2,0(R1)SD R2,0(R1) ;store result;store result
DADDIU R1,R1,#8DADDIU R1,R1,#8 ;increment pointer;increment pointer
BNE R2,R3,LOOPBNE R2,R3,LOOP ;branch if not last element;branch if not last element
Dyn
amic S
ched
ulin
g, M
ultip
le Issue, an
d S
pecu
lation
ExampleExample
– 51 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Dyn
amic S
ched
ulin
g, M
ultip
le Issue, an
d S
pecu
lation
Example (No Speculation)Example (No Speculation)
– 52 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Dyn
amic S
ched
ulin
g, M
ultip
le Issue, an
d S
pecu
lation
Dual Issue/Dual Commit ExampleDual Issue/Dual Commit Example
– 53 – CSCE 513 Fall 2015Copyright © 2012, Elsevier Inc. All rights reserved.
Need high instruction bandwidth!Need high instruction bandwidth! Branch-Target buffers
Next PC prediction buffer, indexed by current PC
Ad
v. Tech
niq
ues fo
r Instru
ction
Delivery an
d S
pecu
lation
Branch-Target BufferBranch-Target Buffer
– 55 – CSCE 513 Fall 2015
Review of Data HazardsReview of Data Hazards
Assume instruction i comes before instruction jAssume instruction i comes before instruction j
• Instruction j is data dependant on I ifInstruction j is data dependant on I if• i produces a result that j uses• j depends on k and k depends on I
• Name dependence – two instructions use the same Name dependence – two instructions use the same registerregister
• Antidependence when j writes a register that i readsAntidependence when j writes a register that i reads
• Output dependence when both i and j write the same Output dependence when both i and j write the same registerregister
• HazardsHazards• RAW• WAW• WAR