memory redundancy elimination to improve application energy efficiency keith cooper and li xu rice...
Post on 20-Dec-2015
213 views
TRANSCRIPT
Memory Redundancy Elimination to Improve
Application Energy Efficiency
Keith Cooper and Li XuRice UniversityOctober 2003
Memory Redundancy Elimination to Improve Application Energy Efficiency
2
Techniques to Improve Energy
• Circuit and Architecture Level– Dynamic Voltage Scaling (DVS)– Pipeline gating– Cache partitioning
• Application Level Techniques– Optimize behavior to improve
energy
Memory Redundancy Elimination to Improve Application Energy Efficiency
3
Approach
• Profile Energy of Application Execution– Run SPEC2000 and MediaBench– Correlate Execution to Energy
Consumption
• Identify and Evaluate Energy Saving Code Transformations
Memory Redundancy Elimination to Improve Application Energy Efficiency
4
Energy Profiling: Benchmarks
Memory Redundancy Elimination to Improve Application Energy Efficiency
5
Energy Profiling: Testing Setup
• Use SimpleScalar and Wattch– Compiled with SimpleScalar gcc –
O4– Run on out-of-order superscalar
simulator with Wattch module– Configuration
• Architecture: models Alpha 21264• Wattch: 0.35µm, 600MHz, Vdd=2.5V
Memory Redundancy Elimination to Improve Application Energy Efficiency
6
clock28%
ITLB0%
L1 d-cache16%
DTLB1%
L2 cache18%
L1 i-cache8%
bpred5%
rename1%
instruction window6%
load/store queue3%
register file5%
result bus4%
alu5%
Dynamic Power of Components
Memory Redundancy Elimination to Improve Application Energy Efficiency
7
Energy of Clock and Caches
0%
10%
20%
30%
40%
50%
60%
70%
adpcm
g721gsm
epic
pegwit
mpeg
181.
mcf
164.
gzip
256.
bzip2
175.
vpr
197.
parser
300.
twolf
Geo Mea
n
D-Cache
I-Cache
Clock
13.2%
14.4%
30.4%
Memory Redundancy Elimination to Improve Application Energy Efficiency
8
Dynamic Load and Store Count
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
adpcm
g721
gsm epic
pegw
it
mpeg
181.
mcf
164.
gzip
256.
bzip2
175.
vpr
197.
parse
r
300.
twolf
Geo M
ean
Store
Load
5.5%
18.3%
Memory Redundancy Elimination to Improve Application Energy Efficiency
9
Memory Redundancy
• Redundant loads and stores
void foo(X* p){ …p->field_a…
…p->field_a…
…p->field_a…}
foo_asm:
ld (p+offset) =>r …… ld (p+offset) =>r …… ld (p+offset) =>r
Memory Redundancy Elimination to Improve Application Energy Efficiency
10
Memory Redundancy Elimination to Improve Energy Efficiency
• Reduce execution cycle count– Save energy in clocking network
• Reduce I-Cache accesses– Save energy in I-Cache
• Reduce D-Cache accesses– Save energy in D-Cache
Memory Redundancy Elimination to Improve Application Energy Efficiency
11
Memory Redundancy Detection
• Want to know P(adr, v) =?= Q(adr’, v’)
• Global value numbering on memory operations [MSP ’02]
• Annotate P,Q with mem state info• Unified analysis for both scalar and
memory redundancy– Detect more redundancies due to
interaction of scalar and memory values
Memory Redundancy Elimination to Improve Application Energy Efficiency
12
Memory Redundancy Elimination
• Recast scalar CSE (common sub-expression elimination) and PRE (partial redundancy elimination)
• Solve data flow system to remove memory redundancy– Treat loads the same way as scalar– Model dependence using mem state
info– Details in paper
Memory Redundancy Elimination to Improve Application Energy Efficiency
13
Experimental Setup
• Use Rice ILOC compiler• Backend creates SimpleScalar
binaries
Source
SimpleScalar Executable
Front End c2i
Back End i2ss
Analysis/Transformation Passes on ILOC
ILOC ILOC
Memory Redundancy Elimination to Improve Application Energy Efficiency
14
Experimental Setup, Cont’d
• Compare scalar CSE (S-CSE) and scalar PRE (S-PRE) against memory CSE (M-CSE) and PRE (M-PRE)– Implement as ILOC passes– Run SimpleScalar and Wattch to
collect run-time and energy stats
Memory Redundancy Elimination to Improve Application Energy Efficiency
15
Result: Dynamic Loads
60%
70%
80%
90%
100%
110%
M-CSE/S-CSE S-PRE/S-CSE M-PRE/S-CSE
Memory Redundancy Elimination to Improve Application Energy Efficiency
16
Result: Execution Cycles
80%
85%
90%
95%
100%
105%
110%
M-CSE/S-CSE S-PRE/S-CSE M-PRE/S-CSE145%
Memory Redundancy Elimination to Improve Application Energy Efficiency
17
Result: Clock Energy
80%
85%
90%
95%
100%
105%
110%
M-CSE/S-CSE S-PRE/S-CSE M-PRE/S-CSE
Memory Redundancy Elimination to Improve Application Energy Efficiency
18
Result: I-Cache Energy
80%
85%
90%
95%
100%
105%
M-CSE/S-CSE S-PRE/S-CSE M-PRE/S-CSE
Memory Redundancy Elimination to Improve Application Energy Efficiency
19
Result: D-Cache Energy
70%
75%
80%
85%
90%
95%
100%
105%
110%
M-CSE/S-CSE S-PRE/S-CSE M-PRE/S-CSE128%
Memory Redundancy Elimination to Improve Application Energy Efficiency
20
Result: Total Energy
80%
85%
90%
95%
100%
105%
110%
M-CSE/S-CSE S-PRE/S-CSE M-PRE/S-CSE
Memory Redundancy Elimination to Improve Application Energy Efficiency
21
Result: Energy-Delay Product
60%
70%
80%
90%
100%
110%
M-CSE/S-CSE S-PRE/S-CSE M-PRE/S-CSE
153%
Memory Redundancy Elimination to Improve Application Energy Efficiency
22
Result: Application Energy Breakdown
256.bzip2
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
S-CSE M-CSE S-PRE M-PRE
mJ
regfile
bus
alu
lsq
window
rename
bpred
L2 cache
L1 dcache
L1 icache
clock
175.vpr
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
S-CSE M-CSE S-PRE M-PRE
regfile
bus
alu
lsq
window
rename
bpred
L2 cache
L1 dcache
L1 icache
clock
Clock I-Cache D-Cache Total
256.bzip2 12%, 15% 8%, 10% 23%, 24% 12%, 15%
175.vpr 13%, 15% 10%, 12% 25%, 26% 14%, 15%
Memory Redundancy Elimination to Improve Application Energy Efficiency
23
Conclusions
• Application energy profiling– Top energy consuming components:
clocking network and caches • Memory redundancy elimination to
improve energy efficiency– Reduce energy in clock, I-Cache, D-Cache– Results: up to 15% reduction in energy,
24% in energy-delay product on test apps