analysis of branch predictors guang pan ming lu dec 12, 2006
Post on 31-Dec-2015
215 Views
Preview:
TRANSCRIPT
Analysis of Branch Predictors
Guang PanMing Lu
Dec 12, 2006
Outline
Motivation Introduction Previous works Our works Simulation results Conclusion & Recommendations Future works
Motivation Branches are very frequent
Approx. 20% of all instructions Accurate branch prediction improves
performance of a superscalars or superpipled processor. Decreasing miss prediction rate saves
cycles. Decreasing miss prediction rate saves
energy.
Introduction Need to know two things
Whether the branch is taken or not (direction) The target address if it is taken (target)
Direct jumps, function calls Direction known (always taken), target easy to
compute Conditional branches (typically PC-relative)
Direction difficult to predict, target easy to compute Indirect jumps, function returns
Direction known (always taken), target difficult
Introduction (cont’) Framework and traces are based on a branch
predictor competition (Championship Branch Prediction)
Focused on conditional branches, evaluated several branch predictors by measurements on real traces from IBS (Instruction Benchmark Set)
Proposed two closely related modifications of global adaptive prediction mechanisms which can achieve satisfactory accuracy.
Previous Works
Static predictor Always predict Not taken
(predictor_nottaken) Easy to implement 30-40% accuracy … not so good
Always predict Taken (predictor_taken) 60-70% accuracy
Previous Works (cont’)
Local 2-bit predictor (with hysteresis)
T
T
NT
Predict Taken
Predict Not Taken
Predict Taken
Predict Not TakenT
NT
T
NT
NT
Previous Works (cont’) Bimodal predictor
With a table of two-bit entries, indexed with the least significant bits of the instruction addresses.
Entries typically do not have tags.
A particular counter mapped to different branch instructions
Each counter has one of four states:
Strongly not taken Weakly not taken Weakly taken Strongly taken
Previous Works (cont’) Correlating
predictor Branch outcome correlates
with the outcome of some recently executed branches
Use this in our prediction Keep N bits of history
of recent outcomes Use a different M-addr-bit
predictor for each differenthistory
Note: N-bit history means2^N different predictors foreach branch
Branch address (4 bits)
2-bits per branch local predictors
PredictionPrediction
2-bit global branch history
Previous Works (cont’) Gshare predictor
Correlating predictors often wasteful
Some histories are rare or even impossible
Yet we dedicate a counter for each history
Solution: hashing Use a single large predictor
table Hash history and branch
address together Use the hash to index into the
table The hash is just an XOR, so
it’s fast
K bits of branchinstruction address
Index
Table of 2-bitpredictors with2^max(N,K)entries
N bits of globalbranch history
XOR Prediction
Previous Works (cont’)
Why Gshare is bad? Needs a lot of branch instances to
train the different 2-bit predictors Simple 2-bit predictor
Has a prediction after it sees one instance of a branch The gShare predictor
Has a prediction after it sees an instanceof that branch and that particular history
But for the same number of counters, gShare usually gives better prediction accuracy
Previous Works (cont’)
Tag-based PPM (Prediction by Partial Matching) predictor
PPM was originally introduced for text compression, and it was used in for branch prediction.
Tag-based, global-history predictor derived from PPM. Features five tables. (indexed with a different history
length) Prediction is given by the up-down saturating counter
associated with the longest matching history.
Previous Works (cont’)
The PPM predictor features 5 tables. The “bimodal” table on the left has 4k entries, with 4 bits per entry. Each of the 4 other tables has 1k entries, with 12 bits per entry. The table on the right is the one using the more global history bits (80 bits).
Our Work
Hybrid_vote Predictor Combination of bimodal, gshare, ppm
predictors (hardware achievable) Three predictors predict a conditional branch
simultaneously Using voting mechanism to predict
Compare the predictions of three predictors Choose the final prediction from the majority.
Update each of the predictors by its own updating method.
Our Work (Cont’)
Hybrid_select Predictor Combination of bimodal, gshare, ppm
predictors (hardware achievable) Three predictors predict a conditional branch
simultaneously Using bimodal selecting mechanism to
predict Compare the predictions of three predictors Choose the final prediction by:
Bimodal >= 2 choose ppm prediction Bimodal < 2 choose gshare prediction
Update each of the predictors by its own updating method.
Simulation Results
0
20
4060
80
100
120140
160
180
164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 201.compress 202.jess 205.raytrace 209.db
020406080
100120140160180
213.javac 222.mpegaudio 227.mtrt 228.jack 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.tw olf
Taken Not taken Local 2 bits Local 3 bits bimodal 2 bits
bimodal 3 bits Correlating 8x8K Correlating 4x16K Gshare-32K Gshare-64K
PPM hybrid vote hybrid select
Bench mark of MPKI (MPKI = Misses per 1000 Instructions)
Simulation Results (Cont’)
Simulation Results (Cont’)
0%5%
10%15%20%25%30%35%40%45%50%55%60%65%70%
Miss rate
Taken
Not taken
Local 2 bits
Local 3 bits
bimodal 2 bits
bimodal 3 bits
Correlating 8x8K
Correlating 4x16K
Gshare-32K
Gshare-64K
PPM
hybrid vote
hybrid select
Simulation Results (Cont’)
0102030405060708090
100
MPKI
Taken
Not taken
Local 2 bits
Local 3 bits
bimodal 2 bits
bimodal 3 bits
Correlating 8x8K
Correlating 4x16K
Gshare-32K
Gshare-64K
PPM
hybrid vote
hybrid select
MPKI = Misses per 1000 Instructions
Simulation Results (Cont’)
0
0.5
1
1.5
2
2.5
3
3.5
4
CPU time(uS)
Taken
Not taken
Local 2 bits
Local 3 bits
bimodal 2 bits
bimodal 3 bits
Correlating 8x8K
Correlating 4x16K
Gshare-32K
Gshare-64K
PPM
hybrid vote
hybrid select
Conclusion & Recommendations
Identifying a good branch predictor plays an important role in improving performance more effective.
By combining current predictors, new hybrid predictors also perform well in the Benchmark.
Different structures of branch prediction schemes perform well on different branch structures.
The benchmark trace files are in favor of Not taken.
Recommendations: For embedded systems -> gshare For desktop/server systems -> ppm, hybrid_vote
Future Works
Research new methods for selecting more accurate predictions among the predictors.
Research new algorithms for implementing more effective and accurate dynamic branch predictors.
References
[1] David Tarjan & Kevin Skadron, “Merging Path and Gshare indexing in Perceptron Branch Prediction “, ACM Transactions on Architecture and Code Optimization, Vol. 2, No. 3, September 2005, Pages 280–300.
[2] Wikipedia.org, “Branch predictor “, http://en.wikipedia.org/wiki/Branch_prediction, 2006.
[3] A. N. Eden & T. Mudge, “The YAGS Branch Prediction Scheme,” Dept. EECS, University of Michigan, Ann Arbor.
[4] Lecture notes, “Branch Prediction”, http://www.cs.utah.edu/classes/cs6810/lectures/6810-bp.pdf, School of Computing, University of Utah
[5] Pierre Michaud, “A PPM-like, tag-based predictor”, http://www.jilp.org/cbp/Pierre.pdf
[6] John L. Hennessy & David A. Patterson, “Computer Archetecture – A Quantitative Approach”, Third Edition Morgan Kaufmann Publisher, 2003.
Thanks and Questions?
top related