analysis of branch predictors guang pan ming lu dec 12, 2006

Analysis of Branch Predictors

Guang PanMing Lu

Dec 12, 2006

Outline

Motivation Introduction Previous works Our works Simulation results Conclusion & Recommendations Future works

Motivation Branches are very frequent

Approx. 20% of all instructions Accurate branch prediction improves

performance of a superscalars or superpipled processor. Decreasing miss prediction rate saves

cycles. Decreasing miss prediction rate saves

energy.

Introduction Need to know two things

Whether the branch is taken or not (direction) The target address if it is taken (target)

Direct jumps, function calls Direction known (always taken), target easy to

compute Conditional branches (typically PC-relative)

Direction difficult to predict, target easy to compute Indirect jumps, function returns

Direction known (always taken), target difficult

Introduction (cont’) Framework and traces are based on a branch

predictor competition (Championship Branch Prediction)

Focused on conditional branches, evaluated several branch predictors by measurements on real traces from IBS (Instruction Benchmark Set)

Proposed two closely related modifications of global adaptive prediction mechanisms which can achieve satisfactory accuracy.

Previous Works

Static predictor Always predict Not taken

(predictor_nottaken) Easy to implement 30-40% accuracy … not so good

Always predict Taken (predictor_taken) 60-70% accuracy

Previous Works (cont’)

Local 2-bit predictor (with hysteresis)

Predict Taken

Predict Not Taken

Predict Taken

Predict Not TakenT

Previous Works (cont’) Bimodal predictor

With a table of two-bit entries, indexed with the least significant bits of the instruction addresses.

Entries typically do not have tags.

A particular counter mapped to different branch instructions

Each counter has one of four states:

Strongly not taken Weakly not taken Weakly taken Strongly taken

Previous Works (cont’) Correlating

predictor Branch outcome correlates

with the outcome of some recently executed branches

Use this in our prediction Keep N bits of history

of recent outcomes Use a different M-addr-bit

predictor for each differenthistory

Note: N-bit history means2^N different predictors foreach branch

Branch address (4 bits)

2-bits per branch local predictors

PredictionPrediction

2-bit global branch history

Previous Works (cont’) Gshare predictor

Correlating predictors often wasteful

Some histories are rare or even impossible

Yet we dedicate a counter for each history

Solution: hashing Use a single large predictor

table Hash history and branch

address together Use the hash to index into the

table The hash is just an XOR, so

it’s fast

K bits of branchinstruction address

Table of 2-bitpredictors with2^max(N,K)entries

N bits of globalbranch history

XOR Prediction

Why Gshare is bad? Needs a lot of branch instances to

train the different 2-bit predictors Simple 2-bit predictor

Has a prediction after it sees one instance of a branch The gShare predictor

Has a prediction after it sees an instanceof that branch and that particular history

But for the same number of counters, gShare usually gives better prediction accuracy

Tag-based PPM (Prediction by Partial Matching) predictor

PPM was originally introduced for text compression, and it was used in for branch prediction.

Tag-based, global-history predictor derived from PPM. Features five tables. (indexed with a different history

length) Prediction is given by the up-down saturating counter

associated with the longest matching history.

The PPM predictor features 5 tables. The “bimodal” table on the left has 4k entries, with 4 bits per entry. Each of the 4 other tables has 1k entries, with 12 bits per entry. The table on the right is the one using the more global history bits (80 bits).

Our Work

Hybrid_vote Predictor Combination of bimodal, gshare, ppm

predictors (hardware achievable) Three predictors predict a conditional branch

simultaneously Using voting mechanism to predict

Compare the predictions of three predictors Choose the final prediction from the majority.

Update each of the predictors by its own updating method.

Our Work (Cont’)

Hybrid_select Predictor Combination of bimodal, gshare, ppm

predictors (hardware achievable) Three predictors predict a conditional branch

simultaneously Using bimodal selecting mechanism to

predict Compare the predictions of three predictors Choose the final prediction by:

Bimodal >= 2 choose ppm prediction Bimodal < 2 choose gshare prediction

Update each of the predictors by its own updating method.

Simulation Results

120140

164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 201.compress 202.jess 205.raytrace 209.db

020406080

100120140160180

213.javac 222.mpegaudio 227.mtrt 228.jack 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.tw olf

Taken Not taken Local 2 bits Local 3 bits bimodal 2 bits

bimodal 3 bits Correlating 8x8K Correlating 4x16K Gshare-32K Gshare-64K

PPM hybrid vote hybrid select

Bench mark of MPKI (MPKI = Misses per 1000 Instructions)

Simulation Results (Cont’)

10%15%20%25%30%35%40%45%50%55%60%65%70%

Miss rate

Not taken

Local 2 bits

Local 3 bits

bimodal 2 bits

bimodal 3 bits

Correlating 8x8K

Correlating 4x16K

Gshare-32K

Gshare-64K

hybrid vote

hybrid select

0102030405060708090

Not taken

Local 2 bits

Local 3 bits

bimodal 2 bits

bimodal 3 bits

Correlating 8x8K

Correlating 4x16K

Gshare-32K

Gshare-64K

hybrid vote

hybrid select

MPKI = Misses per 1000 Instructions

CPU time(uS)

Not taken

Local 2 bits

Local 3 bits

bimodal 2 bits

bimodal 3 bits

Correlating 8x8K

Correlating 4x16K

Gshare-32K

Gshare-64K

hybrid vote

hybrid select

Conclusion & Recommendations

Identifying a good branch predictor plays an important role in improving performance more effective.

By combining current predictors, new hybrid predictors also perform well in the Benchmark.

Different structures of branch prediction schemes perform well on different branch structures.

The benchmark trace files are in favor of Not taken.

Recommendations: For embedded systems -> gshare For desktop/server systems -> ppm, hybrid_vote

Future Works

Research new methods for selecting more accurate predictions among the predictors.

Research new algorithms for implementing more effective and accurate dynamic branch predictors.

References

[1] David Tarjan & Kevin Skadron, “Merging Path and Gshare indexing in Perceptron Branch Prediction “, ACM Transactions on Architecture and Code Optimization, Vol. 2, No. 3, September 2005, Pages 280–300.

[2] Wikipedia.org, “Branch predictor “, http://en.wikipedia.org/wiki/Branch_prediction, 2006.

[3] A. N. Eden & T. Mudge, “The YAGS Branch Prediction Scheme,” Dept. EECS, University of Michigan, Ann Arbor.

[4] Lecture notes, “Branch Prediction”, http://www.cs.utah.edu/classes/cs6810/lectures/6810-bp.pdf, School of Computing, University of Utah

[5] Pierre Michaud, “A PPM-like, tag-based predictor”, http://www.jilp.org/cbp/Pierre.pdf

[6] John L. Hennessy & David A. Patterson, “Computer Archetecture – A Quantitative Approach”, Third Edition Morgan Kaufmann Publisher, 2003.

Thanks and Questions?

analysis of branch predictors guang pan ming lu dec 12, 2006

Documents

the predictors and consequences of relationship...

1,2 1,*, carly j. stevens3, guang-ming zhang , hong-yi ·...

leong ming hin : ming hin pen

quantum biologyquantum biology neill lambert1*, yueh-nan...

nce s1b2 unit 8 unit 8 an added bonus guang ming high school...

violin bridge mobility analysis under in-plane...

research article norcantharidininduceshl...

a compact microstrip low-pass filter using d-crlh … ·...

guang tian trip and parking at to ds - guang tian

directional etching formation of single-crystalline branched...

fo guang shan buddhist temple

predictors for student success in an online course...

relevance feature mapping for content-based multimedia...

the wharney guang dong hotel

guang-xi spa resort

intermediates · 2020-02-21 · intermediates yu liu,1,2,3,...

empowering live perception with mediapipe · 2019. 10....

research paper a fluorogenic probe for ultrafast and...

wang yi guang

guang r. gao