a low energy set-associative i-cache with extended btb
DESCRIPTION
A Low Energy Set-Associative I-Cache with Extended BTB. K. Inoue, V. Moshnyaga, and K. Murakami. Introduction. Increase in cache size. Power consumed in on-chip caches. DEC 21164 CPU*. StrongARM SA-110 CPU*. Bipolar ECL CPU**. 50%. 25%. 43%. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/1.jpg)
A Low Energy Set-Associative I-Cache with Extended BTB
A Low Energy Set-Associative I-Cache with Extended BTB
K. Inoue, V. Moshnyaga, and K. MurakamiK. Inoue, V. Moshnyaga, and K. Murakami
![Page 2: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/2.jpg)
Introduction
DEC 21164 CPU* StrongARM SA-110 CPU* Bipolar ECL CPU**
25% 43% 50%* Kamble et. al., “Analytical energy Dissipation Models for Low Power Caches”, I S LPED’97** Joouppi et. al., “A 300-MHz 115-W 32-b Bipolar ECL Microprocessor” ,IEEE Journal of Solid-State Circuits’93
Increase in cache size
Power consumed in on-chip caches
![Page 3: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/3.jpg)
Problem of Conventional Caches
Cy
cle
1C
yc
le 2
Phased CachePhased Cache
Low energy consumption but Low energy consumption but Slow accessSlow access
Low energy consumption but Low energy consumption but Slow accessSlow access
way0 way1 way2 way3
Conventional CacheConventional Cache
First access but First access but High energy consumptionHigh energy consumption
First access but First access but High energy consumptionHigh energy consumption
Tag Line
Cy
cle
1
Parallel search strategy produces unnecessary way activation!
![Page 4: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/4.jpg)
Attempts to reduce cache-access energy without performance degradation
Reuses tag-check results to eliminate unnecessary way activation
Can achieve 62% of energy reduction with only 0.2% of performance degradation
History-Based Tag-Comparison I-CacheHistory-Based Tag-Comparison I-Cache
Our Proposal
![Page 5: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/5.jpg)
TagCheck!
TagCheck!
TagCheck!
TagCheck!
TagCheck!
Cache is updated only when replacement occurs!A loaded data stays at the same location at least until the next cache-miss takes place
Conventional Tag-Check Scheme
Programs include a lot of loops!A number of instructions are executed repetitively
Ref. A Ref. A Ref. A Ref. A Ref. A
Inst. A is referenced N times
Miss! Miss!
Cache-miss intervalCache-miss interval
time
Cache is stable!
Completely the same tag-check result!Completely the same tag-check result!
![Page 6: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/6.jpg)
Attempts to reuse tag-check results produced before during a cache-miss interval!
Attempts to reuse tag-check results produced before during a cache-miss interval!
History-Based Tag-Comparison (HBTC) Scheme
The target instruction has been referenced before, and
No cache miss has occurred since the previous reference.
TagCheck! Reuse!Reuse!
Ref. A Ref. AMiss! Miss!
Cache-miss intervalCache-miss interval
time
![Page 7: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/7.jpg)
2. If a cache miss occurs, then we invalidate all the stored tag-check results
1. Execute an instruction A at time T• Perform tag check• Save the tag-check result
into extended BTB
way0 way1 way2 way3
[way2] is the Hit-way!
Index
Concept of the HBTC Cache
3. Execute the instruction A at time T+X
• Reuse the tag-check result to activate only the hit-way’s data sub-array
• Reuse the tag-check result to activate only the hit-way’s data sub-array
way0 way1 way2 way3
Index
[way2] is the Hit-way!
![Page 8: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/8.jpg)
Conventional VS. Phased VS. HBTC
ConventionalConventional
way0 way1 way2 way3Tag Line
Cy
cle
1
way0 way1 way2 way3Tag Line
Cy
cle
1
PhasedPhased
Cy
cle
1C
yc
le 2
Cy
cle
1C
yc
le 2
Cy
cle
1C
yc
le 1
Cy
cle
1C
yc
le 1
HBTCHBTC
Cy
cle
1C
yc
le 1
Cac
he H
itC
ache
Hit
Cac
he M
iss
Cac
he M
iss
Reu
seR
euse
No
Reu
seN
o R
euse
![Page 9: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/9.jpg)
HBTC SA I-$ ArchitecturePBAregPBAreg
Branch-Inst. Addr. Target Addr.
Branch-Inst. Addr. Target Addr.
BTB (Branch Target Buffer) NotTakenPC
Tag Check Result
WPvalid
I-Cache
Miss?
valid flag n of way pointers
Taken
Branch Inst. Addr. Pred.Result Address for
writing
WP Recode Reg.
WP Recode Reg.
WP TableWP Table
WP Reg.WP Reg.
Branch Prediction Result
Mode Controller
Mode Controller
Entry of the WP Table
Mode
![Page 10: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/10.jpg)
HBTC I-$ OperationNormal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checks (Reuse)Tracing Mode (TM): w/ Tag checks (tag-check results are preserved into the WPRreg, and a
re stored into the WP-table on the next BTB hit )
Normal Mode (NM): w/ Tag checksOmitting Mode (OM): w/o Tag checks (Reuse)Tracing Mode (TM): w/ Tag checks (tag-check results are preserved into the WPRreg, and a
re stored into the WP-table on the next BTB hit )
OM
NM TM
BTB HitGOtoNM
GOtoNM
GOtoNMI-Cache miss orBTB replacement orRAS access orBranch misprediction
Mode Transition
All WPs are invalidated!
Valid
InvalidPC and Pred.-result PBAreg
![Page 11: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/11.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Inst. Addr. A
Inst. Addr. B
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T NPC
From I-Cache
4-way I-Cache
WPreg0 1 2 3
![Page 12: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/12.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Taken
T N
From I-Cache
4-way I-Cache
WPreg
Inst. Addr. A
Inst. Addr. BPCA
0 1 2 3
![Page 13: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/13.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Taken
T N
From I-Cache
4-way I-Cache
WPreg
Inst. Addr. A
Inst. Addr. BPCA
NO valid WPs are detected!
0 1 2 3
![Page 14: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/14.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
From I-Cache
4-way I-Cache
WPregNO valid WPs are detected!
Inst. Addr. A
Inst. Addr. BPCA
A T
PC and Branch prediction result are saved!
0 1 2 3
![Page 15: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/15.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
4-way I-Cache
WPreg
Inst. Addr. A
Inst. Addr. BPC
A T
Conventional Accesses!
Tag-Comparison result is stored into the WPRreg!
0 1 2 3
1
![Page 16: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/16.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
4-way I-Cache
WPreg
Inst. Addr. A
Inst. Addr. BPC
A T
Conventional Accesses!
Tag-Comparison result is stored into the WPRreg!
0 1 2 3
3
![Page 17: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/17.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
4-way I-Cache
WPreg
Inst. Addr. A
Inst. Addr. BPC
A T
Conventional Accesses!
Tag-Comparison result is stored into the WPRreg!
0 1 2 3
0
![Page 18: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/18.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
From I-Cache
4-way I-Cache
WPreg
Inst. Addr. A
Inst. Addr. BPC
A T
The WPRreg is stored into the WP-Table entry pointed by the PBAreg!
BTB Hit!B
0 1 2 3
![Page 19: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/19.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
T NA
From I-Cache
4-way I-Cache
WPreg
Inst. Addr. A
Inst. Addr. BPC
Taken0 1 2 3
![Page 20: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/20.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
T N
From I-Cache
4-way I-Cache
WPregValid WPs are detected!
0 1 2 3
Inst. Addr. A
Inst. Addr. BPCA
Taken
![Page 21: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/21.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
From I-Cache
4-way I-Cache
WPreg
Tag-Comparison Reuse
0 1 2 31
Inst. Addr. A
Inst. Addr. BPC
![Page 22: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/22.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
From I-Cache
4-way I-Cache
WPreg
Tag-Comparison Reuse
0 1 2 33
Inst. Addr. A
Inst. Addr. BPC
![Page 23: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/23.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
From I-Cache
4-way I-Cache
WPreg
Tag-Comparison Reuse
0 1 2 30
Inst. Addr. A
Inst. Addr. BPC
![Page 24: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/24.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
From I-Cache
4-way I-Cache
WPreg
No valid WPs in the WPreg!
0 1 2 3?
Inst. Addr. A
Inst. Addr. BPC
![Page 25: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/25.jpg)
HBTC I-$ Operation Example
OM
NM TM
BTB Hit
GOtoNM
GOtoNM
Mode TransitionValid
Invalid
Target Addr.
Target Addr.
Branch Target Buffer
PBAreg
WP Table
WPRreg
Mode Controller
Pred. (T or N)
T N
From I-Cache
4-way I-Cache
WPreg
Conventional Accesses!
0 1 2 3
Inst. Addr. A
Inst. Addr. BPC
![Page 26: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/26.jpg)
Advantages and Disadvantages
☺Eliminate unnecessary energy consumption w/o performance degradation (during OM)!
way0 way1 way2 way3
Normal Mode (NM) / Tracing Mode (TM)way0 way1 way2 way3
Omitting Mode (OM)
☹BTB energy overhead due to WP-table read-accesses☹BTB access conflict for invalidating all WPs (ca
uses 1 stall cycle)☹BTB access conflict to record WP
s (causes 1 stall cycle)
![Page 27: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/27.jpg)
Evaluation – Environment –• OOO simulation by SimpleScalar
16 KB 4-way I-cache (32 B line size)For others, default parameters were used
• Cache Energy Model based on [Kamble97](including the WP-table read-energy overhead)
• Assume that the BTB is accessed only when branch or jump instructions are executed (instructions are pre-decoded)
099.go, 124.m88ksim, 126.gcc, 129.compress,130.li, 132.ijpeg
102.swim
mpeg2encode, mpeg2decode, adpcm_enc, adpcm_dec
SPECint95
SPECfp95
Mediabench
Benchmark Programs
[Kamble97] M.B.Kamble and K.Ghose, ”Analytical Energy Dissipation Models For Low Power Caches,” ISLPED97
![Page 28: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/28.jpg)
Evaluation – Energy and Performance –
62%62% 0.2%0.2%
099.go 126.gcc 130.li 102.swim adpcm(d) mpeg2(d) 124.m88ksim 129.comp. 132.ijpeg adpcm(e) mpeg2(e)
1.2
1.0
0.8
0.6
0.4
0.2
0.0
1.2
1.0
0.8
0.6
0.4
0.2
0.0
No
rmal
ized
En
erg
y (J
ou
le)
No
rmal
ized
En
erg
y (J
ou
le) N
orm
alized E
xe. Tim
e (cycle)N
orm
alized E
xe. Tim
e (cycle)
# of WPs = 4
62% of Ecache reduction with 0.2% of Exe. Time increase Even if in the worst case, about 20% of Ecache reduction
![Page 29: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/29.jpg)
Evaluation – Effect of WP invalidation penalty –
0
0.5
1
1.5
2
2.5
3
1 2 4 8 16 32No
rm. E
xe
. Tim
e (
cyc
le)
WP Invalidation Penalty (cycle)
126.gcc126.gcc
099.go099.go
mpeg2(d)mpeg2(d) 132.ijpeg132.ijpeg
Cache Miss Cache Miss PenaltyPenalty
If the penalty is equal to or smaller than 4 clock cycles, the performance overhead is trivial.
The performance overhead grows after the penalty is more than 4 clock cycles.
![Page 30: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/30.jpg)
Evaluation – Effect of The Number of WPs –
Increasing the number of WPs makes it possible to reuse many tag-check results
But, it produces BTB access energy overhead
1.0
0.8
0.6
0.4
0.2
0.0
No
rmal
ized
En
erg
y (J
ou
le)
Energy Overhead of BTB
1 2 4 8 16 32 1 2 4 8 16 32# of Way Pointer
w/ Pre-Decodingw/ Pre-Decoding
126.gcc
w/o Pre-Decodingw/o Pre-DecodingEnergy for Cache Access1.2
![Page 31: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/31.jpg)
Evaluation – Effect of Cache Associativity –
Conv.: Ecache grows with the increase in assiciativity HBTC: Ecache is reduced with the increase in associativity (n<=
4), after that, It starts to increase (n>4)
0.E+001.E+062.E+063.E+064.E+065.E+066.E+067.E+068.E+06
En
erg
y (J
ou
le) Eothers
EtagEdata,blEdata,prectl
ConventionalConventional HBTCHBTC
Associativity
mpeg2decode
1 2 4 8 16 32 64 1 2 4 8 16 32 64
![Page 32: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/32.jpg)
Conclusions
1. Recodes tag-check results generated by the I-cache into the extended BTB
2. Attempts to reuse them in order to eliminate unnecessary way activation
3. Achieves 62% of I-cache energy reduction with only 0.2% of performance degradation!
History-Based Tag-Comparison Instruction Cache
Future work• Analyze energy consumption based on real chip
design.
![Page 33: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/33.jpg)
Buck Up Slides(History-based Tag-Comparison Cache)
![Page 34: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/34.jpg)
099.go 126.gcc 130.li 102.swim adpcm(d) mpeg2(d) 124.m88ksim 129.comp. 132.ijpeg adpcm(e) mpeg2(e)
0.1
0.0
No
rmal
ized
Tag
-Co
mp
are
Co
un
t
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Interline Sequential approachHistory-Based Look-up CacheCombination of IS and HBL
Evaluation – Comparison with IS Approach –
![Page 35: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/35.jpg)
0.E+00
5.E+06
1.E+07
2.E+07
2.E+07
3.E+07
3.E+07
En
erg
y (J
ou
le)
Eothers EtagEdata,blEdata,prectl
Conventional HBL Cache
1 2 4 8 16 32 64
Associativity
099.go
0.8um CMOS* **
*) M.B.Kamble and K.ghose, “Energy-Efficiency of VLSI Caches: A Comparative Study,” 10 th Int. Conf. On VLSI Design**) S.J.E.Wilton and N.P.Jouppi, “An Enhanced Access and Cycle Time Model for On-Chip Caches,” WRL Research Report 93/5
1 2 4 8 16 32 64
Evaluation – Effects of Cache Associativity –
![Page 36: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/36.jpg)
0.E+00
1.E+07
2.E+07
3.E+07
4.E+07
5.E+07
6.E+07
7.E+07
En
erg
y (J
ou
le)
Eothers EtagEdata,blEdata,prectl
Conventional HBL Cache
Associativity
126.gcc
0.8um CMOS* **
*) M.B.Kamble and K.ghose, “Energy-Efficiency of VLSI Caches: A Comparative Study,” 10 th Int. Conf. On VLSI Design**) S.J.E.Wilton and N.P.Jouppi, “An Enhanced Access and Cycle Time Model for On-Chip Caches,” WRL Research Report 93/5
1 2 4 8 16 32 64 1 2 4 8 16 32 64
Evaluation – Effects of Cache Associativity –
![Page 37: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/37.jpg)
0.E+00
1.E+07
2.E+07
3.E+07
4.E+07
5.E+07
6.E+07
En
erg
y (J
ou
le)
Eothers EtagEdata,blEdata,prectl
Conventional HBL Cache
Associativity
132.ijpeg
0.8um CMOS* **
*) M.B.Kamble and K.ghose, “Energy-Efficiency of VLSI Caches: A Comparative Study,” 10 th Int. Conf. On VLSI Design**) S.J.E.Wilton and N.P.Jouppi, “An Enhanced Access and Cycle Time Model for On-Chip Caches,” WRL Research Report 93/5
1 2 4 8 16 32 64 1 2 4 8 16 32 64
Evaluation – Effects of Cache Associativity –
![Page 38: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/38.jpg)
0.E+001.E+062.E+063.E+064.E+065.E+066.E+067.E+068.E+06
En
erg
y (J
ou
le)
Eothers EtagEdata,blEdata,prectlConventional HBL Cache
Associativity
mpeg2decode
0.8um CMOS* **
*) M.B.Kamble and K.ghose, “Energy-Efficiency of VLSI Caches: A Comparative Study,” 10 th Int. Conf. On VLSI Design**) S.J.E.Wilton and N.P.Jouppi, “An Enhanced Access and Cycle Time Model for On-Chip Caches,” WRL Research Report 93/5
1 2 4 8 16 32 64 1 2 4 8 16 32 64
Evaluation – Effects of Cache Associativity –
![Page 39: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/39.jpg)
1.0
0.8
0.6
0.4
0.2
0.0
No
rmal
ized
En
erg
y (J
ou
le)
Energy Overhead at BTBEnergy for Cache Access
126.gcc126.gcc 132.ijpeg132.ijpeg
1 2 4 8 16 32 1 2 4 8 16 32
# of Way Pointer
w/ Pre-Decoding(BTB access occurs only at branch, or jump, executions)
Evaluation – Effects of # of WPs –
![Page 40: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/40.jpg)
1.0
0.8
0.6
0.4
0.2
0.0
No
rmal
ized
En
erg
y (J
ou
le)
Energy Overhead at BTBEnergy for Cache Access
126.gcc126.gcc 132.ijpeg132.ijpeg
1 2 4 8 16 32 1 2 4 8 16 32
# of Way Pointer
w/o Pre-Decoding(BTB access occurs for all instructions)
Evaluation – Effects of # of WPs –
![Page 41: A Low Energy Set-Associative I-Cache with Extended BTB](https://reader036.vdocuments.us/reader036/viewer/2022062520/5681588e550346895dc5ee6b/html5/thumbnails/41.jpg)
Evaluation – Effect of WP invalidation penalty –
0
0.5
1
1.5
2
2.5
3
1 2 4 8 16 32
No
rmal
ized
Exe
. T
ime
(c
ycle
)
WP Invalidation Penalty (cycle)
126.gcc126.gcc
099.go099.go
mpeg2(d)mpeg2(d) 132.ijpeg132.ijpeg
0%10%20%30%40%50%60%70%80%90%
100%
099.go 126.gcc 130.li 102.swim adpcm(d) mpeg2(d) 124.m88ksim 129.comp.132.ijpeg adpcm(e) mpeg2(e)
Cache Miss Cache Miss PenaltyPenalty
Bre
akd
ow
n o
f W
P invalid
ati
ons
BTB ReplacementCache MissBTB ReplacementCache Miss
If the penalty is equal to or smaller than 4 clock cycles, the performance overhead is trivial.
The performance overhead grows after the penalty is more than 4 clock cycles.