dynamic history-length fitting: a third level of adaptivity for branch prediction toni juan sanji...

Dynamic History-Length Fitting:A third level of adaptivity for branch

prediction

Toni JuanSanji SanjeevanJuan J. Navarro

Department of Computer ArchitectureUniversity Politècnica de Catalunya

Presented by Danyao WangECE1718, Fall 2008

ISCA '98

2

Overview

• Branch prediction background

• Dynamic branch predictors

• Dynamic history-length fitting (DHLF)– Without context switches

– With context switches

• Results

• Conclusion

3

Why branch prediction?

• Superscalar processors with deep pipelines– Intel Core 2 Duo: 14 stages

– AMD Athlon 64: 12 stages

– Intel Pentium 4: 31 stages

• Many cycles before branch is resolved– Wasting time if wait…

– Would be good if can do some useful work…

• Branch prediction!

4

What does it do?sub r1, r2, r3bne r1, r0, L1add r4, r5, r6…

L1: add r4, r7, r8sub r9, r4, r2

fetch decode sub

fetch decode sub

fetch decode bne

fetch decode add

Execute speculatively

Predict taken.Fetch from L1

Branch resolved

Time

Branch fetched

Validate prediction: Correct

5

What happens when mispredicted?sub r1, r2, r3bne r1, r0, L1add r4, r5, r6…

L1: add r4, r7, r8sub r9, r4, r2

fetch decode sub

fetch decode sub

fetch decode bne

fetch decode add

Execute speculatively

Predict taken.Fetch from L1

Branch resolved

Time

Branch fetched

Validate prediction: Incorrect!

squash

6

How to predict branches?

• Statically at compile time– Simple hardware

– Not accurate enough…

• Dynamically at execution time– Hardware predictors

• Last-outcome predictor

• Saturation counter

• Pattern predictor

• Tournament predictorMore ComplexMore Accurate

7

Last-Outcome Branch Predictor

• Simplest dynamic branch predictor

• Branch prediction table with 1-bit entries

• Intuition: history repeats itself

2N entries

PC

lower N bits of PC

Branch Prediction Table

index

1-bit Prediction: T or NT-Read at Fetch-Write on misprediction

8

Saturation Counter Predictor

• Observation: branches highly bimodal

• n-bit saturation counter– Hysteresis

– n-bit entries in branch prediction table

00 01 10 11

Pred. TakenPred. Not-TakenT T T

T

NNN

N

WEAK bias

Strong biase.g. 2-bit bimodal predictor

9

Pattern Predictors

• Near-by branches often correlate

• Looks for patterns in branch history– Branch History Register (BHR): m most recent branch

outcomes

2N entries

PC

lower n bits of PC


N-bit index

saturation counter

BHR

m-bit history

f

Two-Level Predictor

10

Tournament Predictor

• No one-size-suits-all predictor

• Dynamically choose among different predictors

Predictor A

Predictor B

PC

Predictor C

Chooser or metapredictor

11

What is the best predictor?

Optimal

Better

12

Observations

• Predictor performance depends on history length

• Optimal history length differs for programs

• Predictors with fixed history length underperforming potential

• … dynamic history length?

Dynamic History-Length Fitting (DHLF)

14

Intuition

• Tournament predictor– Picks best out of many predictors

– Spatial multiplexing

– Area cost …

• DHLF: time multiplexing– Try different history lengths during execution

– Adapt history length to code

– Hope to find the best one

15

2-Level Predictor Revisited

• Index = f(PC, BHR)

• gshare, f = xor, m < n

• 2-bit saturation counter

2n entries

PC

lower n bits of PC


n-bit index

saturation counter

BHR

m-bit history

f

PredeterminedFigure out dynamically

16

DHLF Approach

• Current history length

• Best so far length

• Misprediction counter

• Branch counter

• Table of measured misprediction rates per length– Initialized to zero

• Sampling at fixed intervals (step size)– Try new length: get MR– Adjust if worse than best seen before– Move to a random length if length has not changed for a while

• Avoids local minima

17

DHLF ExamplesIndex = 12 bitsstep = 16K

Optimal

18

Experimental Methodology

• SPECint95

• gshare and dhlf-gshare

• Trace-driven simulation

• Simulated up to 200M conditional branches

• Branch history register & pattern history table immediately updated with the true outcome

19

DHLF Performance

• Area overhead– Index length = 10; step size = 16K; overhead = 7%– Index length = 16; step size = 16K; overhead = 0.02%

Better

20

Optimization Strategies

• Step size– Small: learns faster

• Has to be big enough for meaningful misprediction stats

– Big: learns slower

• Change length incrementally– Test as many lengths as possible

• Warm-up period– No MR count for 1 interval after length change

21

Context Switches

• Branch prediction table trashed periodically

• Lower prediction accuracy immediately after a context switch

• Context switch frequency affects optimal history length

22

Impact on Misprediction Rate

Better

gshare. Index = 16 bits

Context-switch distance: # branches executed between context switches

23

Coping with Context Switches

• Upon context switch– Discard current misprediction counter

– Save current predictor data• misprediction table

• current history length

• Approx. 221 bits for 16-bit index, step = 16K, 13 bit misprediction counter

• Returning from a context switch– Warm-up: no MR counter for 1 interval

24

DHLF with Context SwitchesM

ispr

edic

tion

rate

Better

x dhlf-gshare with step value = 16K gshare with all possible history length

Branch prediction table flush every 70K instructions to simulate context switch.

25

Contributions

• Dynamically finds near-optimal history lengths

• Performs well for programs with different branch behaviours

• Performs well under context switches

• Can be applied to any two-level branch predictor

• Small area overhead

Backup Slides

27

DHLF Performance: SPECint95

dhlf-share; step size = 16K. Compared to all possible history lengths (no context switch)

Better

Better

28

DHLP with Context Switches

Better

Better

dhlf-gshare; step size = 16K; context-switch distance = 70K

29

dhlf-gskew

Step value = 16K. Compared to all history lengths for gskew,

Better

30

dhlf-gskew with Context Switch

Step size = 16K; Context-switch distance = 70K.

Better

31

DHLF Structure

Run next interval

Misprediction table

N entries

0

1

Nstep dynamicbranches

Initial history length

branch counter

misprediction counter

current misprediction > min achieved?

ptr. to min. misprediction count

ptr. to entry for current history length

Yes

Adjust history length

No

DHLF Data Structure

32

Questions

• Is fixed context switch distance realistic?

• Does updating the PHT with true branch data immediately affect results?– Previous studies show little impact due to this

dynamic history-length fitting: a third level of adaptivity for branch prediction toni juan sanji...

Documents