hardware architectures for power and energy adaptation
DESCRIPTION
Hardware Architectures for Power and Energy Adaptation. Phillip Stanley-Marbell. Outline. Motivation Related Research Architecture Experimental Evaluation Extensions Summary and Future work. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/1.jpg)
Hardware Architectures for Power and Energy Adaptation
Phillip Stanley-Marbell
![Page 2: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/2.jpg)
2
Outline
Motivation
Related Research
Architecture
Experimental Evaluation
Extensions
Summary and Future work
![Page 3: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/3.jpg)
3
Motivation Power consumption is becoming a limiting factor with
scaling of technology to smaller feature sizes Mobile/battery-powered computing applications Thermal issues in high end servers
Low Power Design is not enough: Power- and Energy-Aware Design
Adapt to non-uniform application behavior Only use as many resources as required by application
This talk : Exploit processor-memory performance gap to save power, with limited performance degradation
![Page 4: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/4.jpg)
4
Related Research Reducing power dissipation in on-chip caches
Reducing instruction cache leakage power dissipation [Powell et al, TVLSI ‘01]
Reducing dynamic power in set-associative caches and on-chip buffer structures [Dropsho et al, PACT ‘02]
Reducing power dissipation of CPU core Compiler-directed dynamic voltage scaling of
CPU core [Hsu, Kremer, Hsiao. ISLPED ‘01]
![Page 5: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/5.jpg)
5
Target Application Class: Memory-Bound Applications
Memory-bound applications Limited by memory system performance
CPU @ Vdd
CPU @ Vdd/2
Single-issue in-order processors Limited overlap of main memory access and
computation
![Page 6: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/6.jpg)
6
Power-Performance Tradeoff Detect memory-bound execution phases
Maintain sufficient information to determine compute / stall time ratio
Pros Scaling down CPU core voltage yields significant
energy savings (Energy Vdd2)
Cons Performance hit (Delay Vdd)
![Page 7: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/7.jpg)
7
Power Adaptation Unit (PAU) Maintains information to determine ratio of compute to stall time
Entries allocated for instructions which cause CPU stalls
Intuitively, one table entry required per program loop
[From S-M et al, PACS 2002]
Fields: State (I, A, T, V) # instrs. executed (NINSTR) Distance b/n stalls (STRIDE) Saturating ‘Quality’ counter (Q)
![Page 8: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/8.jpg)
8
PAU Table Entry State Machine
If CPU at-speed,slow it down
∂ = 0.01 • STRIDE + NINSTRNINSTR
Slowdown factor, ∂, for a target 1% performance degradation:
![Page 9: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/9.jpg)
9
Example
for (x = 100;;)
{if (x- - > 0)
a = i;
b = *n;c = *p++;
}
PAU table entries created for each assignment
After 100 iterations, assignment to a stops Entries for b or c can take
over immediately
![Page 10: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/10.jpg)
10
Experimental Methodology Simulated PAU as part of a single-issue embedded
processor Used Myrmigki simulator [S-M et al, ISLPED 2001] Models Hitachi SH RISC embedded processor
5 stage in-order pipeline 8K unified L1, 100 cycle latency to main memory
Empirical instruction power model, from SH7708 device Voltage scaling penalty of 1024 cycles, 14uJ
Investigated effect of PAU table size on performance, power
Intuitively, PAU table entries track program loops with repeated stalls
![Page 11: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/11.jpg)
11
Effect of Table Size on Energy Savings
Single-entry PAU table provides 27% reduction in energy, on average
Scaling up to a 64-entry PAU table only provides additional 4%
![Page 12: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/12.jpg)
12
Effect of Table Size on Performance
Single-entry PAU table incurs 0.75% performance degradation, on avg.
Large PAU table, leads to more aggressive behavior, increased penalty
![Page 13: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/13.jpg)
13
Overall Effect of Table Size : Energy-Delay product
Considering both performance and power, there is little benefit from larger PAU table sizes
![Page 14: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/14.jpg)
14
Extending the PAU structure
Multiprogramming environments
Superscalar architectures
Slowdown factor computation
![Page 15: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/15.jpg)
15
PAU in Multiprogramming Environments
Only a single entry necessary per application
Amortize mem.-bound phase detection Would be wasteful to flush PAU at each context switch (~10ms)
Extend PAU entries with an ID field:
CURID and IDMASK fields written to by OS
![Page 16: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/16.jpg)
16
PAU in Superscalar Architectures
Dependent computations are ‘stretched out’ FUs with no dependent instructions unduly slowed down
Maintain separate instruction counters per FU:
Drawback : Requires ability to runFUs in core at different voltages
CPU @ Vdd
CPU @ Vdd/2
![Page 17: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/17.jpg)
17
Slowdown factor computation
Computation only performed on application phase change Hardware solution would be wasteful
Solution : computation by software ISR Compute ∂ , lookup discrete Vdd/Freq. by indexing
into a lookup table
Similar software handler solution proposed in [Dropsho et al, 2002]
![Page 18: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/18.jpg)
18
Summary & Future Work PAU : Hardware identifies program regions (loops)
with compute / memory stall mismatch
Due to nature of most programs, even single entry PAU is effective : can achieve 27% energy savings with only 0.75% perf. Degradation
Proposed extensions to PAU architecture
Future work Evaluations with smaller miss penalties Implementation of proposed extensions More extensive evaluation of implications of applications
![Page 19: Hardware Architectures for Power and Energy Adaptation](https://reader036.vdocuments.us/reader036/viewer/2022082710/56812c61550346895d90f3a0/html5/thumbnails/19.jpg)
19
Questions