![Page 1: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/1.jpg)
Power Management
Lecture notes S. Yalamanchili and S. Mukhopadhyay
![Page 2: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/2.jpg)
(2)
Technology Scaling
• 30% scaling down in dimensions doubles transistor density
• Power per transistor Vdd scaling lower power
• Transistor delay = Cgate Vdd/ISAT Cgate, Vdd scaling lower delay
GATE
SOURCE
BODY
DRAIN
tox
GATE
SOURCE DRAIN
L
leakddstdddd IVIVfCVP 2
![Page 3: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/3.jpg)
(3)
Moore’s Law
3
From wikipedia.org
• Performance scaled with number of transistors
• Dennard scaling*: power scaled with feature size
Goal: Sustain Performance Scaling
*R. Dennard, et al., “Design of ion-implanted MOSFETs with very small physical dimensions,” IEEE Journal of Solid State Circuits, vol. SC-9, no. 5, pp. 256-268, Oct. 1974.
![Page 4: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/4.jpg)
(4)
Parallelism and PowerIBM Power5
Source: IBM
AMD Trinity
Source: forwardthinking.pcmag.com
• How much of the chip area is devoted to compute?
• Run many cores slower. Why does this reduce power?
![Page 5: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/5.jpg)
(5)
The Power Wall
• Power per transistor scales with frequency but also scales with Vdd
Lower Vdd can be compensated for with increased pipelining to keep throughput constant
Power per transistor is not same as power per area power density is the problem!
Multiple units can be run at lower frequencies to keep throughput constant, while saving power
leakddstdddd IVIVfCVP 2
![Page 6: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/6.jpg)
(6)
Mukhopadhyay and Yalamanchili (2009)
Based on scaling using Pentium-class cores While Moore’s Law continues, scaling phenomena have
changed Power densities are increasing with each generation
6
What is the Problem?
![Page 7: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/7.jpg)
(7)
ITRS Roadmap for Logic Devices
From: “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008
![Page 8: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/8.jpg)
Power Management Basics
Lecture notes S. Yalamanchili and S. Mukhopadhyay
![Page 9: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/9.jpg)
(9)
What are my Options?
1. Better technology Manufacturing Better devices (FinFet) New Devices non-CMOS? this is the future
2. Be more efficient – activity management Clock gating – dynamic energy/power Power gating – static energy/power Power state management - both
3. Improved architecture Simpler pipelines
4. Parallelism
Not this course
![Page 10: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/10.jpg)
(10)
Activity Management
• Turn off clock to a block of logic
• Eliminate unnecessary transitions/activity
• Clock distribution power
• Turn off power to a block of logic, e.g., core
• No leakage
Combinational Logic
clk
clk
cond
input
clk
Core 0 Core 1
VddPower gate transistor
Clock Gating Power Gating
![Page 11: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/11.jpg)
(11)
Multiple Voltage Frequency Domains
From E. Rotem et. Al. HotChips 2011
• Cores and ring in one DVFS domain• Graphics unit in another DVFS domain• Cores and portion of cache can be gated
off
Intel Sandy Bridge Processor
![Page 12: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/12.jpg)
(12)
Processor Power States
• Performance States – P-states Operate at different voltage/frequencies
o Recall delay-voltage relationship Lower voltage lower leakage Lower frequency lower power (not the same as energy!) Lower frequency longer execution time
• Idle States - C-states Sleep states Differ is how much state is saved
• SW or HW managed transitions between states!
![Page 13: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/13.jpg)
(13)
Example of P-states
• Software Managed Power States
• Changing Power States is not free
AMD Trinity A10-5800 APU: 100W TDP
CPU P-state
Voltage (V)
Freq (MHz)
HWOnly
(Boost)
Pb0 1 2400
Pb1 0.875 1800
SW-Visible
P0 0.825 1600
P1 0.812 1400
P2 0.787 1300
P3 0.762 1100
P4 0.75 900
![Page 14: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/14.jpg)
(14)
Example of P-states
From: http://www.intel.com/content/www/us/en/processors/core/2nd-gen-core-family-mobile-vol-1-datasheet.html
![Page 15: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/15.jpg)
(15)
Management Knobs
• Each core can be in any one of a multiple of states
• How do I decide what state to set each core? Who decides? HW? SW?
• How do I decide when I can turn off a core?
• What am I saving? Static energy or dynamic energy?
![Page 16: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/16.jpg)
(16)
Power Management
• Software controlled power management Optimize power and/or energy Orchestrated by the operating system or application
libraries Industry standard interfaces for power management
o Advanced Configuration and Power Interface (ACPI) https://www.acpica.org/ http://www.acpi.info/
• Hardware power management Optimized power/energy Failsafe operation, e.g., protect against thermal
emergencies
![Page 17: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/17.jpg)
(17)
Power Management3.0
Time Die
Tem
pera
ture
Thermal Headroo
m
Convert thermal headroom to higher performance through boost
HW Boost states
Max Die Temp
SW visible states
Perf
orm
an
ce
CPUDVFS-state
HWOnly
(Boost)
Pb0Pb1
SW-Visible
P0P1P2- - -
Pmin
Inst
ruct
ions/
cycl
e
Time
Performance and energy efficiency depend on effective utilization of power and thermal headroom
![Page 18: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/18.jpg)
(18)
Boosting
• Exploit package physics Temperature changes on the
order of milliseconds
• Use the thermal headroom
Max Power
TDP Power
Low power – build up thermal credits
Turbo boost region
10s of seconds
Intel Sandy Bridge
![Page 19: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/19.jpg)
(19)
Power Gating
Intel Sandy Bridge Processor
• Turn off components that are not being used Lose all state information
• Costs of powering down
• Costs of powering up
• Smart shutdown Models to guide decisions
![Page 20: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/20.jpg)
(20)
Parallelism
• Concurrency + lower frequency greater energy efficiency
leakddstdddd IVIVfCVP 2
Core
Cache
Core
Cache
Core
Cache
Core
Cache
Core
Cache
• 4X #cores• 0.75x voltage• 0.5x Frequency• 1X power• 2X in performance
Example
![Page 21: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/21.jpg)
(21)
Simplify Core DesignAMD Bulldozer Core
ARM A7 Core (arm.com)
• Support for branch prediction, schedulers, etc. consumes more energy per instruction
• Can fit many more simpler cores on a die
![Page 22: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/22.jpg)
(22)
Metrics
• Power efficiency MIPS/watt Ops/watt
• Energy efficiency Joules/instruction Joules/op
• Composite Energy-delay product Energy-delay2 Why are these useful?
![Page 23: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/23.jpg)
Modeling
Lecture notes S. Yalamanchili and S. Mukhopadhyay
![Page 24: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/24.jpg)
(24)
Microarchitectural Level Models
• How can we study power consumption without building circuits? Models
• Models can are available at multiple levels of abstraction.
We are interested in microarchitectural models
![Page 25: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/25.jpg)
(25)
Processor Microarchitecture
Instruction Cache
Instruction Queue
FetchQueue
Instruction Decoder
BranchPrediction
Register Files
Instruction TLB
ALU
MUL
FPU
LD
ST
L1 Data Cache
DataTLB
L2 Data CacheNoC Router
On-ChipNetwork
Fetch Decode Execute/Writeback
Memory
Network
![Page 26: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/26.jpg)
(26)
Energy/Power Calculation
• How do we calculate energy or power dissipation for a given microarchitecture?
• Energy/Power varies between: Different ISA; ARM vs Intel x86
Different microarchitecture; in-order vs out-of-order
Different applications; memory vs compute-bound
Different technologies; 90nm vs 22nm technology
Different operation conditions; frequency, temperature
![Page 27: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/27.jpg)
(27)
Architecture Activity (1)
Instruction Cache
Instruction Queue
FetchQueue
Instruction Decoder
BranchPrediction
Register Files
Instruction TLB
ALU
MUL
FPU
LD
ST
L1 Data Cache
DataTLB
L2 Data CacheNoC Router
On-ChipNetwork
Activity 1: Instruction Fetch
icache.read++; fbuffer.write++;
• Collect activity counts of each architecture component (through simulation or measurement).
• List of components differs between microarchitectures.
• Activity counts at each component differs between applications.
![Page 28: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/28.jpg)
(28)
Architecture Activity (2)
Instruction Cache
Instruction Queue
FetchQueue
Instruction Decoder
BranchPrediction
Register Files
Instruction TLB
ALU
MUL
FPU
LD
ST
L1 Data Cache
DataTLB
L2 Data CacheNoC Router
On-ChipNetwork
Activity 2: Instruction Decode
fbuffer.read++; idecoder.logic++;
• Read/write accesses to caches, buffers, etc.
• Logical accesses to logic blocks such as decoder, ALUs, etc.
• Tradeoff of differentiating more access types (accuracy) vs simulation speed (complexity).
![Page 29: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/29.jpg)
(29)
Power and Architecture Activity
• For example, At nth clock cycle, collected counters are: Data cache:
o read = 20, write = 12;
o per-read energy = 0.5nJ; per-write energy = 0.6nJ;
o Read energy = read*per-read energy = 10nJ
o Write energy = write*per-write energy = 7.2nJ
o Total activity energy = read+write energies = 17.2nJ
o If n = 50th clock cycle and clock frequency = 2GHz,Total activity power = energy*clock_freq/n = 688mW
*Note: n/clock_freq = n clock periods in sec power = time average of energy
![Page 30: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/30.jpg)
(30)
Things to consider (1)
1. How do we calculate per-read/write energies?
• Per-access energies can be estimated from circuit-level designs and analyses.
• There are various open-source tools for this.
Architecture Specification
Technology Parameters
Circuit-levelEstimation
Tool
Estimation Results:Area, Energy, Timing, etc.
![Page 31: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/31.jpg)
(31)
Things to consider (2)
2. Is per-access energy always the same?
• Per-access energy in fact depends on:• how many bits are switching • how they are switching (0→1 or 1→0)
• It is reasonable to assume constant per-access energy in long-term observation (e.g., n = 1M clock cycles); the number of switching bits are averaged (e.g., 50% of bits are switching).
• Most architecture simulators do not capture bit-level details due to simulation complexity.
![Page 32: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/32.jpg)
(32)
Things to consider (3)
3. If a register file didn’t have read/write accesses but held data, what is the energy dissipation?
• Energy (or power) is largely comprised of dynamic and static dissipations.
• Dynamic (or switching) energy refers to energy dissipation due to switching activities.
• Static (or leakage) energy is dissipation to keep the electronic system turned on.
• In this case, the register file has no dynamic energy dissipation but consumes static energy.
![Page 33: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/33.jpg)
Thermal Issues
Lecture notes S. Yalamanchili and S. Mukhopadhyay
![Page 34: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/34.jpg)
(34)
Thermal Issues
• Heat can cause damage to the chip Need failsafe operation
• Thermal fields change the physical characteristics Leakage current and therefore power increases Delay increases Device degradation becomes worse
• Cooling solution determines the permitted power dissipation
![Page 35: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/35.jpg)
(35)
Thermal Design Power (TDP)
• This is the maximum power at which the part is designed to operate Dictates the design of the
cooling system o Max temperature Tjmax
Typically fixed by worst case workload
• Parts are typically operating below the TDP
• Opportunities for turbo mode?
AMD Trinity APU
http://ecs.vancouver.wsu.edu/thermofluids-research
![Page 36: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/36.jpg)
(36)
Heat Sink Limits on Performance
Thermal design power (TDP) Determines the cooling solution & package
limits
Performance depends on effective utilization of this thermal headroom
www.legitreviews.com
Inst
ruct
ion
s/cy
cle
Time
Thermal Headroom
Max Die Temp
Convert thermal headroom to higher performance through boosting
HW Boost states
SW visible states
Boost powerTDP Power
Workload
Tem
pP
ow
er
![Page 37: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/37.jpg)
(37)
Trinity TDP
Source: http://www.anandtech.com/show/6347/amd-a10-5800k-a8-5600k-review-trinity-on-the-desktop-part-2
![Page 38: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/38.jpg)
(38)
Issues
• Cooling chips is now an issue for computer architects!
• Co-design the cooling system and the processor
• Some very “cool” new technologies E.g., microfluidics!
![Page 39: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/39.jpg)
(39)
Electrical and Fluidic I/Os
• Fluid flow through the microchannels carry heat out to an external heat exchanger (e.g., heat sink)
Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE)
![Page 40: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/40.jpg)
(40)
Fabrication Examples
Electrical and fluidic microbumps, fluidic vias and fine wires
Micropin-fins (150 µm diameter and 225 µm diameter)and vias
Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE)
![Page 41: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/41.jpg)
(41)
Conclusions
• Power/energy is the leading driver of modern architecture design
• Power and energy management is key to scalability
• Need integrated power/energy, performance, thermal management in fielded systems
• What about energy/power efficient algorithms?
![Page 42: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/42.jpg)
(42)
Study Guide
• Explain the difference between energy dissipation and power dissipation
• Distinguish between static power dissipation and dynamic power dissipation
• Explain dynamic voltage frequency scaling What are power states? Why is this an advantage? What is the impact of DVFS on i) energy, ii) execution
time, and iii) power
• Distinguish between clock gating and power gating
![Page 43: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/43.jpg)
(43)
Study Guide (cont.)• Define thermal design power (TDP)
• Name two schemes to preventing the chip from exceeding TDP. Explain how they achieve this goal
• What does boosting achieve?
• What is the difference between C-states and P-states?
• Name one power management technique that will save static power?
• How does using many slower simpler cores improve power efficiency?
![Page 44: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/44.jpg)
(44)
Study Guide (cont.)
• How is thermal design power (TDP) calculated?
• When using boost algorithms, what determines the duration of the high frequency operation?
• How does a power virus work?
• Describe how throttling works
• Know the power dissipation in some modern processor-memory systems drawn from the embedded, server, and high performance computing segments
![Page 45: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay](https://reader030.vdocuments.us/reader030/viewer/2022013004/56649de55503460f94adceb1/html5/thumbnails/45.jpg)
(45)
Glossary
• Boosting
• C-states
• Dynamic Power and Energy
• Power Gating
• P-states
• Static Power and Energy
• Time constant
• Thermal Design Point
• Throttling