adapted from computer organization and design, patterson hennessy, ucb ece232: hardware...
DESCRIPTION
ECE232: BrPredict 3 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren BEQ resolved during the MEM stage PCSrc Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data ALU Shift left 2 Add Data Memory Address Write Data Read Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control BranchTRANSCRIPT
![Page 1: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/1.jpg)
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB
ECE232: Hardware Organization and Design
Part 13: Branch prediction (Chapter 4/6)
http://www.ecs.umass.edu/ece/ece232/
![Page 2: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/2.jpg)
ECE232: BrPredict 2 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Branch Instructions Cause Control Hazards
Instr.
Order
lw
Inst 4
Inst 3
beq
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
ALUIM Reg DM Reg
F D EX M W
F D EX M W
jr
![Page 3: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/3.jpg)
ECE232: BrPredict 3 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
BEQ resolved during the MEM stagePCSrc
ReadAddress
InstructionMemory
Add
PC
4
Write Data
Read Addr 1
Read Addr 2
Write Addr
Register
File
Read Data 1
Read Data 2
16 32
ALU
Shiftleft 2
Add
DataMemory
Address
Write Data
ReadData
IF/ID
SignExtend
ID/EXEX/MEM
MEM/WB
Control
Branch
![Page 4: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/4.jpg)
ECE232: BrPredict 4 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
stall
stall
stall
One Way to “Fix” a Control HazardInstr.
Order
beq
ALUIM Reg DM Reg
lw
ALUIM Reg DM Reg
ALU Inst 3
IM Reg DM
Fix branch hazard by waiting – introduce stalls
![Page 5: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/5.jpg)
ECE232: BrPredict 5 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Reducing branch penalty through HW design
![Page 6: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/6.jpg)
ECE232: BrPredict 6 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Reducing Control Hazards’ Penalties Stalls – hurts performance Deeper pipelines have higher penalties 1. Move decision point as early in the pipeline as possible –
reduces number of stalls at the cost of additional hardware 2. Delay decision (requires compiler support) – “Delayed
Branch”:
• not effective for deeper pipes - requiring more than one delay slot to be filled
3. Predict outcome of branch
beq $1,$2,NEXT add $4,$3,$5 sub $7,$2,$8
NEXT
![Page 7: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/7.jpg)
ECE232: BrPredict 7 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Branch Prediction Easiest - static prediction
• Always taken, always not taken• Opcode based• Displacement based (forward not taken, backward taken)• Compiler directed (branch likely, branch not likely)
Dynamic prediction – prediction per branch in program• 1 bit predictor – remember last taken/not taken per branch
• Use a branch-history table (BHT) with 1 bit entry• Use part of the PC (low-order bits) to index
table – Why?• Multiple branches may
share the same bit• Invert the bit if
prediction is wrong
Predictor 0
Predictor 127
Predictor 1
•••
Branch PC
BHT
![Page 8: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/8.jpg)
ECE232: BrPredict 8 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Branch Prediction 1 bit predictor
• Backward branches for loops will be mispredicted twiceEX: If a loop branches 9 times in a row and not taken once,
what is the prediction accuracy? Misprediction at the first and last loop iteration => 80%
prediction accuracy, although branch is taken 90%
Modern processors – multiple instructions issued per cycle, more branch hazards will occur per cycle• Cost of branch mispredicted goes up• Pentium II – 3 instructions issued per cycle, 12+ cycle
misprediction penalty• Huge penalty for a misfetched path following a branch
T. . . TTT T N TT . . .
N
![Page 9: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/9.jpg)
ECE232: BrPredict 9 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
2-bit Branch Prediction 4 states instead of 2, allowing for more information about tendencies A prediction must miss twice
before it is changed Good for backward branches
of loops 2-bit saturating counter
T
T
NT
N
T
N
N
Predict Taken
Predict Taken
Predict not taken
Predict not taken
T. . . TTT T T TT . . .
N
![Page 10: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/10.jpg)
ECE232: BrPredict 10 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Branch History Table - BHT
01
BHTbranch PC
2 bits by N (e.g. 4K entries) Uses low-order bits of
branch PC to choose entry Plot misprediction instead of
prediction
Predictor 0
Predictor 4095
Predictor 1
•••
01
![Page 11: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/11.jpg)
ECE232: BrPredict 11 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Is Branch Predictor Enough? When is using branch prediction beneficial?
• Clearly when the outcome is known later than the target• Otherwise - If we predict the branch is taken (and suppose it is
correct), what is the target address?• Need a mechanism to provide target address as well• Use a Branch Target Buffer (BTB) that includes the target
address Can we eliminate the one cycle delay for the 5-stage pipeline?
• Need to fetch from branch target immediately after branch was fetched
![Page 12: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/12.jpg)
ECE232: BrPredict 12 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Branch Target Buffer (BTB)BTB is a cache that contains the predicted PC value instead of whether the
branch will take place or not (Ex. Loop address)Is the current instruction a branch ?• BTB provides the answer before the current instruction is decoded
and therefore enables fetching to begin after IF-stage (for branch) What is the branch target ?• BTB provides the branch target if the prediction is a taken branch
(for not taken branches the target is simply PC+4 )
![Page 13: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/13.jpg)
ECE232: BrPredict 13 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
BTB
![Page 14: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/14.jpg)
ECE232: BrPredict 14 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
BTB operations BTB hit, prediction
taken → 0 cycle delay BTB hit, misprediction
≥ 2 cycle penalty – Correct BTB
BTB miss, branch ≥ 1 cycle penalty (Detected at the ID stage and entered in BTB)
TakenBranch?
Entry found in branch-target buffer?
Send out predicted PCIs
instruction a taken branch?
Send PC to memory and branch-target
buffer
Enter branch instruction
address and next PC into branch-
target buffer
Mispredicted branch, kill
fetched instruction;
restart fetch at other target; update target
buffer
Normal instruction execution
Branch correctly predicted; continue
execution with no stalls
No
Yes
Yes
Yes
No
NoID
IF
EX
![Page 15: Adapted from Computer Organization and Design, Patterson Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)](https://reader036.vdocuments.us/reader036/viewer/2022062413/5a4d1b897f8b9ab0599bde2e/html5/thumbnails/15.jpg)
ECE232: BrPredict 15 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren
Branch Prediction Summary The better we predict, the lower penalty we might incur 2-bit predictors capture tendencies well Correlating predictors improve accuracy, particularly when
combined with 2-bit predictors Accurate branch prediction does no good if we don’t know there
was a branch to predict BTB identifies branches in IF stage BTB combined with branch prediction table identifies branches to
predict, and predicts them well