predicting conditional branches with fusion-based hybrid predictors
DESCRIPTION
Predicting Conditional Branches With Fusion-Based Hybrid Predictors. This research was funded by NSF Grant MIP-9702281. The Branch Prediction Problem. PC Compute. Branch resolution. 1 out of 5 instructions is a branch May require many cycles to resolve - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/1.jpg)
Predicting Conditional Predicting Conditional Branches With Fusion-Branches With Fusion-
Based Hybrid PredictorsBased Hybrid Predictors
Gabriel H. LohGabriel H. Loh Yale UniversityYale UniversityDept. of Computer ScienceDept. of Computer Science
Dana S. HenryDana S. Henry Yale UniversityYale UniversityDepts. of Elec. Eng. & Comp. Depts. of Elec. Eng. & Comp. Sci.Sci.
This research was funded by NSF Grant MIP-9702281
![Page 2: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/2.jpg)
The Branch Prediction The Branch Prediction ProblemProblem
• 1 out of 5 instructions is a branch1 out of 5 instructions is a branch• May require many cycles to resolveMay require many cycles to resolve
– P4 has 20 cycle branch resolution pipelineP4 has 20 cycle branch resolution pipeline– Future pipeline depths likely to increase Future pipeline depths likely to increase
[Sprangle02][Sprangle02]• Predict branches to keep pipeline fullPredict branches to keep pipeline full
PC Compute Branch resolution
![Page 3: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/3.jpg)
Bigger Predictors = More Bigger Predictors = More AccurateAccurate
• Larger predictors tend to yield more Larger predictors tend to yield more accurate predictionsaccurate predictions
• Faster cycle times force smaller Faster cycle times force smaller branch predictorsbranch predictors
• Overriding predictorOverriding predictor couples small, couples small, fast predictor with a large, multi-fast predictor with a large, multi-cycle predictor [Jiménez2000]cycle predictor [Jiménez2000]– performs close to ideal large-fast performs close to ideal large-fast
predictorpredictor
(but bigger predictors = slower)(but bigger predictors = slower)
![Page 4: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/4.jpg)
Hybrid PredictorsHybrid Predictors• Wide variety of branch prediction Wide variety of branch prediction
algorithms availablealgorithms available• Hybrid combines more than one “stand-Hybrid combines more than one “stand-
alone” or alone” or componentcomponent predictor predictor [McFarling93]:[McFarling93]:
PP11 PP22Meta-Meta-
PredictorPredictor
Final PredictionFinal Prediction
![Page 5: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/5.jpg)
Multi-HybridsMulti-Hybrids
PP11 PP22 PPnn
Pr. Encoder…
… …
…
Final PredictionFinal Prediction
PP11 PP22MM11 PP33 PP44MM22
MM33
Final PredictionFinal Prediction
““Multi-Hybrid” [Evers96]Multi-Hybrid” [Evers96] ““Quad-Hybrid” [Evers00]Quad-Hybrid” [Evers00]
![Page 6: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/6.jpg)
Our Idea: Prediction FusionOur Idea: Prediction Fusion
PP11 ……
PP22 PP33 PPnn
Prediction Selection
PP11 ……
PP22 PP33 PPnn
Prediction Fusion
![Page 7: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/7.jpg)
Early Attempt from MLEarly Attempt from ML
• Weighted Majority algorithm [LW94]Weighted Majority algorithm [LW94]– Better predictors get assigned larger weightsBetter predictors get assigned larger weights– Make final prediction with larger sumMake final prediction with larger sum
• Predictor with largest weight not always correctPredictor with largest weight not always correct
0.487 0.513
PP22 PP66PP77 PP11 PP33 PP44 PP55
PP88
P2, P6 and P7 say “not-taken”P1, P3, P4, P5 and P8 say “taken”
![Page 8: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/8.jpg)
OutlineOutline
• COLT PredictorCOLT Predictor• Choosing parameters and Choosing parameters and
componentscomponents• PerformancePerformance• Prediction distributions, component Prediction distributions, component
choicechoice
![Page 9: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/9.jpg)
COLT OrganizationCOLT Organization
Branch AddressBranch AddressBranch HistoryBranch History
PP11 PP22 PP33 PPnn
11 00 11 00……
…MappingMapping
TableTable
VMTVMT
…
Final PredictionFinal Prediction
![Page 10: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/10.jpg)
Pathological ExamplePathological Example
PP11 PP22 PP33
00 00 00
Actual outcome = 1 (taken)Actual outcome = 1 (taken)
![Page 11: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/11.jpg)
Example (cont’d)Example (cont’d)
PP11 PP22 PP33
00 00 00
Outcome is always wrongOutcome is always wrong
Selection:Selection:
PP11 PP22 PP33
1 1 0 10 0 0
Can recognizeCan recognizeand rememberand rememberthis patternthis pattern
11
COLT:COLT:
VMTVMT
![Page 12: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/12.jpg)
COLT Lookup DelayCOLT Lookup Delay
1 0 0 1 1…
......
......
PP11 PP22
PPnn
PredictionPrediction
timetime
…
MT SelectMT Select
critical delaycritical delay
![Page 13: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/13.jpg)
Design ChoicesDesign Choices• # of branch address bits# of branch address bits• # of branch history bits# of branch history bits
• # of components# of components
• Choice of componentsChoice of components– gshare, PAs, gskewed, …gshare, PAs, gskewed, …– History length, PHT size, …History length, PHT size, …
}}Determines number ofDetermines number ofmapping tablesmapping tables
}}Determines size ofDetermines size ofindividual MT’sindividual MT’s
![Page 14: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/14.jpg)
Predictor ComponentsPredictor Components• Global HistoryGlobal History
– gshare [McFarling93]gshare [McFarling93]– Bi-Mode [Lee97]Bi-Mode [Lee97]– Enhanced gskewed Enhanced gskewed
[Michaud97][Michaud97]– YAGS [Eden98]YAGS [Eden98]
• Local HistoryLocal History– PAs [Yeh94]PAs [Yeh94]– pskewed [Evers96]pskewed [Evers96]
• OtherOther– 2bC (bimodal) [Smith81]2bC (bimodal) [Smith81]– Loop [Chang95]Loop [Chang95]– alloyed Perceptron alloyed Perceptron
[Jiménez02][Jiménez02]
}}history lengthshistory lengthsoptimized onoptimized ontest data setstest data sets
Total of 59 configurationsTotal of 59 configurationsSizes vary up to 64KBSizes vary up to 64KB
![Page 15: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/15.jpg)
Huge Search SpaceHuge Search Space• 225959 ways to choose components ways to choose components ways to choose COLT parametersways to choose COLT parameters• We use a genetic searchWe use a genetic search
…
bit-k = 0 means don’t include Pbit-k = 0 means don’t include Pkkbit-k = 1 means do include Pbit-k = 1 means do include Pkk
VMT SizeVMT Size historyhistorylengthlength
gene format:gene format:……
![Page 16: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/16.jpg)
MethodologyMethodology• SPEC2000 integer benchmarksSPEC2000 integer benchmarks
– For tuning/optimization: 10M branches For tuning/optimization: 10M branches from testfrom test
– For evaluation: 500M branches from trainFor evaluation: 500M branches from train• Skipped first 100M branchesSkipped first 100M branches
– Compiled with Compiled with cc –arch ev6 –O4 –fast –non_sharedcc –arch ev6 –O4 –fast –non_shared
• SimpleScalar simulatorSimpleScalar simulator– sim-safe for trace collectionsim-safe for trace collection– MASE for ILP simulationsMASE for ILP simulations
![Page 17: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/17.jpg)
Genetic Search COLT Genetic Search COLT ResultsResults
NamNamee
SizeSize(KB)(KB) ComponentsComponents VMTVMT CounteCounte
r widthr widthHistorHistor
y y lengthlength
1616alpct(34/alpct(34/1010) ) gskewed(12)gskewed(12)
gshare(8)gshare(8)20482048 44 88
3232alpct(34/alpct(34/1010) ) gshare(15)gshare(15)
gshare(9) PAs(gshare(9) PAs(77))81928192 44 77
6464alpct(40/alpct(40/1414) )
gshare(16) YAGS(11) gshare(16) YAGS(11) pskewed(pskewed(66))
1638416384 44 1010
128128alpct(40/alpct(40/1414) ) alpct(38/alpct(38/1414) ) gshare(16) gshare(16)
gskewed(13) gskewed(13) YAGS(12) PAs(YAGS(12) PAs(88))
1638416384 44 77
256256alpct(50/alpct(50/1818) ) alpct(34/alpct(34/1010) )
gshare(18) Bi-gshare(18) Bi-Mode(16) Mode(16)
gskewed(15) PAs(gskewed(15) PAs(88))
3276832768 44 44
![Page 18: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/18.jpg)
Overall Predictor Overall Predictor PerformancePerformance
![Page 19: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/19.jpg)
Per-Benchmark Per-Benchmark PerformancePerformance
![Page 20: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/20.jpg)
ILP PerformanceILP Performance• Simulated CPU:Simulated CPU:
– 6-issue6-issue– 20 cycle pipeline20 cycle pipeline– Same functional units, latencies, caches Same functional units, latencies, caches
as as IntInteell P4/NetBurst microarchitecture P4/NetBurst microarchitecture
1-cycle1-cycle2bC2bC
4-cycle4-cycleOR alpctOR alpct
++ ++
4-cycle4-cycleOR COLTOR COLT
IdealIdeal1-cycle1-cycleCOLTCOLT
![Page 21: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/21.jpg)
ILP ImpactILP Impact
![Page 22: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/22.jpg)
COLT Parameter COLT Parameter SensitivitySensitivity
• Mapping table counter widthsMapping table counter widths• Number of mapping tablesNumber of mapping tables• Number of history bits for VMT Number of history bits for VMT
indexindex
![Page 23: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/23.jpg)
Counter WidthCounter Width
![Page 24: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/24.jpg)
VMT SizeVMT Size
![Page 25: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/25.jpg)
History LengthHistory Length
![Page 26: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/26.jpg)
Explaining Choice of Explaining Choice of ComponentsComponents
• Parameter sensitivity results shows Parameter sensitivity results shows GA performed well for the COLT GA performed well for the COLT parametersparameters
• Why did it choose the component Why did it choose the component predictors that it did?predictors that it did?
![Page 27: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/27.jpg)
Classifying COLT Classifying COLT PredictionsPredictions
• We examined the We examined the (32KB) COLT config. (32KB) COLT config.• For each mapping table lookup, we For each mapping table lookup, we
examine the neighboring entries:examine the neighboring entries:
PP11 PP22 PP33 PP44
11 00 00 11 1111
0010
1001
entry entry 00001 = NT001 = NT
entry 1001 = Tentry 1001 = T
entry 1entry 11101 = T01 = T
![Page 28: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/28.jpg)
Classifying Predictions Classifying Predictions (cont’d)(cont’d)
easy: all neighboring entries agreeeasy: all neighboring entries agreeshort: only gshare(9) distinguishesshort: only gshare(9) distinguisheslong: only gshare(14) distinguisheslong: only gshare(14) distinguisheslocal: only PAs(local: only PAs(77) distinguishes) distinguishesperceptron: only alpct(34/perceptron: only alpct(34/1010) )
distinguishesdistinguishesmulti-length: mix of gshare(9), (14) or multi-length: mix of gshare(9), (14) or
alpctalpctmixed: both global and local componentsmixed: both global and local components
gsharegshare(9)(9)
gsharegshare(14)(14)
PAsPAs((77))
alpctalpct(34/(34/1010))32KB COLT:32KB COLT:
Classes:Classes:
![Page 29: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/29.jpg)
Prediction ClassificationsPrediction Classifications
![Page 30: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/30.jpg)
Related Work/IssuesRelated Work/Issues• Alloyed history [Skadron00]Alloyed history [Skadron00]• Variable path history length [Stark98]Variable path history length [Stark98]• Dynamic history length fitting [Juan98]Dynamic history length fitting [Juan98]• Interference reduction [lots…]Interference reduction [lots…]
COLT handles all of these cases*COLT handles all of these cases*
Doesn’t support partial update policiesDoesn’t support partial update policies
![Page 31: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/31.jpg)
Open ResearchOpen Research• Better individual componentsBetter individual components• Augment with SBI [Manne99], agree Augment with SBI [Manne99], agree
[Sprangle97][Sprangle97]• Better fusion algorithmsBetter fusion algorithms• Hybrid fusion/selection algorithmsHybrid fusion/selection algorithms• Other domains (branch confidence Other domains (branch confidence
prediction, value prediction, memory prediction, value prediction, memory dependence prediction, instruction dependence prediction, instruction criticality prediction, …)criticality prediction, …)
![Page 32: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/32.jpg)
SummarySummary• Fusion is more powerful than selectionFusion is more powerful than selection
– Combines multiple sources of informationCombines multiple sources of information• Branch behavior is very variedBranch behavior is very varied
– Need long, short, global and local histories, Need long, short, global and local histories, multiple simultaneous lengths and types of multiple simultaneous lengths and types of historyhistory
• COLT is one possible fusion-based COLT is one possible fusion-based predictorpredictor– Combines multiple types of informationCombines multiple types of information– Current “best” purely dynamic predictor*Current “best” purely dynamic predictor*
![Page 33: Predicting Conditional Branches With Fusion-Based Hybrid Predictors](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815de4550346895dcc0bc1/html5/thumbnails/33.jpg)
Questions?Questions?