adapted from computer organization and design, patterson hennessy, ucb ece232: hardware...

15
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6) http://www.ecs.umass.edu/ece/ece232/

Upload: brendan-rose

Post on 18-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

ECE232: BrPredict 3 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren BEQ resolved during the MEM stage PCSrc Read Address Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data ALU Shift left 2 Add Data Memory Address Write Data Read Data IF/ID Sign Extend ID/EX EX/MEM MEM/WB Control Branch

TRANSCRIPT

Page 1: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

ECE232: Hardware Organization and Design

Part 13: Branch prediction (Chapter 4/6)

http://www.ecs.umass.edu/ece/ece232/

Page 2: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 2 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Branch Instructions Cause Control Hazards

Instr.

Order

lw

Inst 4

Inst 3

beq

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

ALUIM Reg DM Reg

F D EX M W

F D EX M W

jr

Page 3: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 3 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

BEQ resolved during the MEM stagePCSrc

ReadAddress

InstructionMemory

Add

PC

4

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

16 32

ALU

Shiftleft 2

Add

DataMemory

Address

Write Data

ReadData

IF/ID

SignExtend

ID/EXEX/MEM

MEM/WB

Control

Branch

Page 4: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 4 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

stall

stall

stall

One Way to “Fix” a Control HazardInstr.

Order

beq

ALUIM Reg DM Reg

lw

ALUIM Reg DM Reg

ALU Inst 3

IM Reg DM

Fix branch hazard by waiting – introduce stalls

Page 5: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 5 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Reducing branch penalty through HW design

Page 6: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 6 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Reducing Control Hazards’ Penalties Stalls – hurts performance Deeper pipelines have higher penalties 1. Move decision point as early in the pipeline as possible –

reduces number of stalls at the cost of additional hardware 2. Delay decision (requires compiler support) – “Delayed

Branch”:

• not effective for deeper pipes - requiring more than one delay slot to be filled

3. Predict outcome of branch

beq $1,$2,NEXT add $4,$3,$5 sub $7,$2,$8

NEXT

Page 7: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 7 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Branch Prediction Easiest - static prediction

• Always taken, always not taken• Opcode based• Displacement based (forward not taken, backward taken)• Compiler directed (branch likely, branch not likely)

Dynamic prediction – prediction per branch in program• 1 bit predictor – remember last taken/not taken per branch

• Use a branch-history table (BHT) with 1 bit entry• Use part of the PC (low-order bits) to index

table – Why?• Multiple branches may

share the same bit• Invert the bit if

prediction is wrong

Predictor 0

Predictor 127

Predictor 1

•••

Branch PC

BHT

Page 8: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 8 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Branch Prediction 1 bit predictor

• Backward branches for loops will be mispredicted twiceEX: If a loop branches 9 times in a row and not taken once,

what is the prediction accuracy? Misprediction at the first and last loop iteration => 80%

prediction accuracy, although branch is taken 90%

Modern processors – multiple instructions issued per cycle, more branch hazards will occur per cycle• Cost of branch mispredicted goes up• Pentium II – 3 instructions issued per cycle, 12+ cycle

misprediction penalty• Huge penalty for a misfetched path following a branch

T. . . TTT T N TT . . .

N

Page 9: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 9 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

2-bit Branch Prediction 4 states instead of 2, allowing for more information about tendencies A prediction must miss twice

before it is changed Good for backward branches

of loops 2-bit saturating counter

T

T

NT

N

T

N

N

Predict Taken

Predict Taken

Predict not taken

Predict not taken

T. . . TTT T T TT . . .

N

Page 10: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 10 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Branch History Table - BHT

01

BHTbranch PC

2 bits by N (e.g. 4K entries) Uses low-order bits of

branch PC to choose entry Plot misprediction instead of

prediction

Predictor 0

Predictor 4095

Predictor 1

•••

01

Page 11: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 11 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Is Branch Predictor Enough? When is using branch prediction beneficial?

• Clearly when the outcome is known later than the target• Otherwise - If we predict the branch is taken (and suppose it is

correct), what is the target address?• Need a mechanism to provide target address as well• Use a Branch Target Buffer (BTB) that includes the target

address Can we eliminate the one cycle delay for the 5-stage pipeline?

• Need to fetch from branch target immediately after branch was fetched

Page 12: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 12 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Branch Target Buffer (BTB)BTB is a cache that contains the predicted PC value instead of whether the

branch will take place or not (Ex. Loop address)Is the current instruction a branch ?• BTB provides the answer before the current instruction is decoded

and therefore enables fetching to begin after IF-stage (for branch) What is the branch target ?• BTB provides the branch target if the prediction is a taken branch

(for not taken branches the target is simply PC+4 )

Page 13: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 13 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

BTB

Page 14: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 14 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

BTB operations BTB hit, prediction

taken → 0 cycle delay BTB hit, misprediction

≥ 2 cycle penalty – Correct BTB

BTB miss, branch ≥ 1 cycle penalty (Detected at the ID stage and entered in BTB)

TakenBranch?

Entry found in branch-target buffer?

Send out predicted PCIs

instruction a taken branch?

Send PC to memory and branch-target

buffer

Enter branch instruction

address and next PC into branch-

target buffer

Mispredicted branch, kill

fetched instruction;

restart fetch at other target; update target

buffer

Normal instruction execution

Branch correctly predicted; continue

execution with no stalls

No

Yes

Yes

Yes

No

NoID

IF

EX

Page 15: Adapted from Computer Organization and Design, Patterson  Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

ECE232: BrPredict 15 Adapted from Computer Organization and Design,Patterson&Hennessy, UCB, Kundu,UMass Koren

Branch Prediction Summary The better we predict, the lower penalty we might incur 2-bit predictors capture tendencies well Correlating predictors improve accuracy, particularly when

combined with 2-bit predictors Accurate branch prediction does no good if we don’t know there

was a branch to predict BTB identifies branches in IF stage BTB combined with branch prediction table identifies branches to

predict, and predicts them well