power management considerations for networked...

47
Powr Management for Wireless SOCs ©2001 R. Gupta, ASP-DAC’01 1 Power Management Considerations for Power Management Considerations for Networked Devices Networked Devices Rajesh K. Gupta Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92612 [email protected] with contributions courtesy: M. Srivastava, UCLA, Nikil Dutt, UC Irvine 2 Outline Outline l Wireless networked system-on-chip design: I & II L Power management in networked SOCs n Energy consumption characteristics in networked SOCs n Power metrics n Power management strategies u Circuit-level strategies <= NOT COVERED HERE. u Architectural and protocol strategies u Software strategies compiler techniques u OS strategies energy efficiency via shutdown, voltage scaling n Power management in 802.11 and Bluetooth l Design tools for networked system-on-chip

Upload: dinhdiep

Post on 28-Aug-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 1

Power Management Considerations forPower Management Considerations forNetworked DevicesNetworked Devices

Rajesh K. Gupta

Center for Embedded Computer SystemsUniversity of California, Irvine

Irvine, CA [email protected]

with contributions courtesy:M. Srivastava, UCLA, Nikil Dutt, UC Irvine

2

OutlineOutline

l Wireless networked system-on-chip design: I & IIè Power management in networked SOCs

n Energy consumption characteristics in networked SOCsn Power metricsn Power management strategies

uCircuit-level strategies <= NOT COVERED HERE.u Architectural and protocol strategiesu Software strategies

ä compiler techniquesuOS strategies

ä energy efficiency via shutdown, voltage scalingn Power management in 802.11 and Bluetooth

l Design tools for networked system-on-chip

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 2

3

Portability ConsiderationsPortability Considerations

l Networked SOCs are finding use far beyond traditionaldesktop machinesn Pocket computers, PDAs, wireless pads, wireless

sensors, pagers, cell phonesn Efficient power use is crucial to portability, reliability and

thermal managementl Energy and power usage of these devices is markedly

different from laptop and notebook computersnmuch wider dynamic range of power demandn increasing share of memory, communication and signal

processing subsystems (as opposed to disk storage,displays)

nmultiple power use modalities depending uponapplication:u “immortal”, “paging-mode RX”, “lifeline TX”,

“mission mode”è Design of power-aware higher layer applications & protocols

4

Power Supply

Where does the Power Go?Where does the Power Go?

Bat

tery

DC-DCConverter

Communication

RadioModem

RFTransceiver

Processing

ProgrammableµPs & DSPs

(apps, protocols etc.) Memory

ASICs

Peripherals

Disk Display

Signaling protocols, choice of modulation, TX/RX architecture, RF/IF circuits

Baseband DSP

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 3

5

Example: Power Consumption for aExample: Power Consumption for aComputerComputer with Wireless NIC with Wireless NIC

Display36%

Wireless LAN18%

Hard Drive18%

CPU/Memory21%

Other7%

6

Capabilities: vibration, acoustic,accelerometer, magnetometer,temperature sensing

Example 1: Power MeasurementsExample 1: Power Measurementson Rockwell WINS Nodeon Rockwell WINS Node

Processor Seismic Sensor Radio Power (mW)Active On Rx 751.6Active On Idle 727.5Active On Sleep 416.3Active On Removed 383.3Active Removed Removed 360.0Active On Tx (36.3 mW) 1080.5

Tx (27.5 mW) 1033.3Tx (19.1 mW) 986.0Tx (13.8 mW) 942.6Tx (10.0 mW) 910.9Tx (3.47 mW) 815.5Tx (2.51 mW) 807.5Tx (1.78 mW) 799.5Tx (1.32 mW) 791.5Tx (0.955 mW) 787.5Tx (0.437 mW) 775.5Tx (0.302 mW) 773.9Tx (0.229 mW) 772.7Tx (0.158 mW) 771.5Tx (0.117 mW) 771.1

Summaryl Processor = 360 mW

n doing repeatedtransmit/receive

l Sensor = 23 mWl Processor : Tx = 1 : 2l Processor : Rx = 1 : 1l Total Tx : Rx = 4 : 3

at maximum range

CommunicationSubsystem

RadioModem

GPS

MicroController

Rest of the Node

CPU Sensor

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 4

7

Example 2: Power Consumption forExample 2: Power Consumption forCompaq Compaq WRL’s WRL’s Itsy ComputerItsy Computer

l System power < 1Wn doing nothing (processor 95% idle)

u 107 mW @ 206 MHzu 77 mW @ 59 MHzu 62 mW @ 59 MHz, low voltage

n MPEG-1 with audiou 850 mW @ 206 MHz (16% idle)

n Dictationu 775 mW @ 206 MHz (< 0.5% idle)

n text-to-speechu 420 mW @ 206 MHz (53% idle)u 365 mW @ 74 MHz, low voltage ( < 0.5% idle)

l Processor: 200 mWn 42-50% of typical total

l LCD: 30-38 mWn 15% of typical total

u 30-40% in notebooks

Itsy v1StrongARM 110059–206 MHz (300 us to switch)2 core voltages (1.5V, 1.23V)64M DRAM / 32M FLASHTouchscreen & 320x200 LCDcodec, microphone & speakerserial, IrDA

8

Example 3: Power Consumption forExample 3: Power Consumption forBerkeley’s Berkeley’s InfoPadInfoPad Terminal Terminal

DC/DC25%

LCD6%

I/O1%

Video Display

40%

Wireless18%

µProc.6%

Misc7%

With Optional Video DisplayTotal = 9.6W

(with processor at 7% duty cycle)

DC/DC42%

LCD10%

I/O2%

Wireless29%

µProc.6%

Misc11%

Without Optional Video DisplayTotal = 6.8W

(with processor at 7% duty cycle)

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 5

9

Power Consumption in WirelessPower Consumption in WirelessSOCsSOCs

l SOCs with Radiosl There are two components of power

n Instantaneous Power Consumptionu directly affected by transceiver architecture and RF

circuit designu a wide variation in power efficiency of RF front-ends

ä paging receivers (930MHz carrier) with 1uV signaldetection can last for months on a single AAA cellä cell phones with essentially similar sensitivity

characteristics are about 10X worse.uNot covered here.

n Average Power Consumptionu affected by communication protocols and power

management strategies

10

Metrics for PowerMetrics for Power

l Absolute power (mW)n sets battery life in hoursn problem: power ∝ frequency (slow the system!)

l uW/MHzn average energy consumed by the system

l Energy per operationu fixes obvious problem with the power metricu but can cheat by doing stuff that will slow the chip

å Energy/op = Power * Delay/opl Metric should capture both energy and performance: e.g.

Energy/Op * Delay/Opu Energy*Delay = Power*(Delay/Op)2

l Therefore:n uW/MIPS: average energy per instructionn uW/MIPS^2: normalizes uW/MIPS with the architectural performance

-- useful for comparing architectures for power efficiency.

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 6

Power ManagementPower Management

12

Evaluating AEvaluating APower Management StrategyPower Management Strategy

l Know the power bottlenecks in the systemn effect on system vs. component power consumption

l Be fairn compare to current, and not the worst, strategy!

l Maximize work done, and not battery lifen e.g. impact of system slowdown on user efficiency

l Consider the effect on other componentsn e.g. disk and display will stay on longer if CPU speed is

cut in halfl Battery capacity is not a constant

n depends on the power consumption level and profile.

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 7

13

Where to do theWhere to do thePower Management?Power Management?

l Choices: H/W, Firmware, OS, Application, User

l Hardware & firmwaren don’t know the global state and application-specific

knowledgel Users

n don’t know component characteristics, and can’t makefrequent decisions

l Applicationsn operate independentlyn and the OS hides machine information from them

l OS is the most reasonable place, but…nOS should incorporate application information in power

managementnOS should expose power state and events to applications

for them to adapt.

14

Wireless NESWireless NES

l Components of power awareness in firmware designn signalling protocols, choice of modulation

u e.g., On-Off Keying, ASK need only a threshold detector, whereasFSK needs a frequency discriminator (but less susceptible tonoise)

n transceiver architecturenRF, IF analog circuitsnBaseband DSP

u e.g., max. a posteriori (MAP) estimation, iterative (turbo) channelestimation and coding can reduce TX power by 1/3 to 1/2

l Increased signal processing in BB enables flexibility in RF/IF designn e.g., increased noise figure or reduced TX power

l Higher level protocols (e.g., multiple access, link layer) can enable powerhungry circuits to be active for as short time as possiblen e.g., MAC often includs some variant of TDMAn exploit asymmetry between basestation and mobile units (e.g.,

different duty cycles in paging receivers)

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 8

15

Power Management in Wireless NESPower Management in Wireless NES

l Power modes: transmit, receive, idle, sleep, offn typically idle mode (ready but neither receiving nor transmitting)

takes similar power as receive moden transmit power in WLANs is x2-x3 of receive power

u difference larger in WWANs (RF power dominates)u often RF transmit power to be varied, thereby NICs transmit

mode power, but at the cost of varying BERn transition times are significant

uHP’s HSDL-1001 IR transceiver takes 10 µs to enter sleep mode,and 40 µs to wake up

uWavelan takes about 100ms to wake upuMetricom’s Ricochet takes about 5s to wake up

l Shutdown strategies similar to disks and CPUsn sleep ↔ wakeup transition times << in disksn could be done by MAC protocols e.g. 802.11

l Reduce load in NIC:n header compression, stop data transmission during bad channel...

16

Help from Upper Layers in PowerHelp from Upper Layers in PowerManagement of Wireless NESManagement of Wireless NES

l Minimizing idle time matters the mostn other factors secondary, such as specific protocol

l Transport: don’t leave the receiver idle while there iscongestion in the network

l Data scheduling: coordinate data delivery to receiver inbursts

l S/W control of NI for application-level optimizations

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 9

17

Architectural StrategiesArchitectural Strategies

l Gated-clocks and power-shutdownn stopping unused hardware

l More efficient algorithms and architecturesn focus on power under a speed constraint

l Proper I/O interconnect design and packagingn include as much of system in a single packagen recompute data rather than refetch from memoryn use local memory / cache to minimize I/On coding of data to minimize I/I bus transitions

CommunicationSubsystem

RadioModem

GPS

MicroController

Rest of the Node

CPU Sensor

MultihopPacket Communication

Subsystem

RadioModem

GPS

MicroController

Rest of the Node

CPU Sensor

MultihopPacket

… zZZ

Traditional Approach Power-aware Approach

18

Energy Impact of Architecture:Energy Impact of Architecture:Shared-bus vs. SwitchedShared-bus vs. Switched

l Router-based approach isolates I/O devicesn reduces switching capacitance, frequency, voltage, and

ultimately energy consumptionn takes the main CPU out of the datapathn allows rapid, low power, LUT-based decision making up

through the network layer

I/O2

I/O3

I/ON

CPU(I/O1)

Radio

I/O2

I/O3

I/ON

Link Ctrl(I/O1)

PacketProcessor

Router

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 10

19

Energy EfficientEnergy EfficientProtocol ProcessingProtocol Processing

l Circuit level low power circuits address PHY layerimplementation

l The problem is that for networked systems, high levelprotocols often implicitly assume continuous availability oflower layer (especially PHY layer) functions

l Effective power management requires power aware protocoldesignn e.g., 802.11 MAC can reduce power by allowing both PHY

transmitter and receiver to be turned off without a stationappearing as disconnected from LAN

l In the following we address:n Energy efficiency in MAC and Link layersn Energy efficient higher layer processing

20

Energy Efficient MACEnergy Efficient MAC

l Reduce time radio is in transmit modenminimize random access collisions and consequent

retransmissionsn use polling, slot reservation

l Reduce time radio is in receive modenminimize listening for packets to arriven broadcast periodic “schedule” telling receivers when to

wake upl Reduce transmit-receive and on-off turn-around

nmaximize contiguous transmission slots from a radiol Allow mobiles to voluntarily enter into sleep model Reduce MAC signaling traffic

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 11

21

The MAC Level Perspective:The MAC Level Perspective:Optimized use of PHY LayerOptimized use of PHY Layer

l Higher power consumption at higher symbol ratesn power hungry equalizer is needed to combat ISInmodulation requiring higher Eb/N0

u e.g. higher order QAMl Power hungry DSP to combat impairments

n channel codingnRAKE receivern antenna array

l Proper use of PHY layer processing is criticaln adapt according to conditions

22

The MAC Level PerspectiveThe MAC Level PerspectiveOptimized MAC ProtocolOptimized MAC Protocol

l Simple vs. complex protocolsn computation + communication energy per “useful” bitn asymmetric protocols to keep mobile simple

l Frame structuren header overhead: e.g. using header compressionn encoding header for lower energy decoding, and invoking

costlier receiver functions on the frame body only if theframe is for the receiveru e.g. in HIPERLAN header is at lower rate (no

equalizer)l Adapting frame length

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 12

23

The MAC Level PerspectiveThe MAC Level PerspectiveOptimized MAC Protocol (contd.)Optimized MAC Protocol (contd.)

l Radio state management (active vs. sleep)n how to send packets to a receiver that sleepsn how to make sure not to miss packetsn impact on higher layer protocols

l MAC-level error controln adapting FEC according to channel conditionsn channel-state dependent schedulingn transmission channel probing during ARQ

u channel state may be persistentu probe impaired channels via short low-power probes

instead of blind retransmission of high-power datapackets

24

The MAC Level PerspectiveThe MAC Level PerspectiveOptimized Network DesignOptimized Network Design

l Cell sizen reducing cell size means smaller transmit power

u system capacity also goes upn but complexity of mobility management goes up

umore hand-off eventsl Centralized vs. ad hoc network architecture

n networks with basestationsu asymmetric processing with complexity kept at BSu but, make intra-cell communication less efficient

n ad hoc networksumulti-hop takes less energy than single hopu intermediate nodes lose battery energy to other’s

traffic

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 13

25

Power Aware Routing &Power Aware Routing &Channel AllocationChannel Allocation

l Minimize transmit power of mobile nodes to increase lifetimeof individual nodes and networknminimize total power of n/w, or max power by a nodenNote: Least-power path != shortest path

l Conventional multihop routing protocols such as DSR, DSDVetc. are power-unaware

umetrics used: shortest hop, shortest delay, linkquality, location stability, message & time overhead

l Metrics to considernMinimize energy consumed / packet

ä large dissipation at selected bottleneck nodesnMaximize time to network partition

ä important for sensor networks etc.ä load balance across the nodes in the cut-setä difficult to implement

nMinimize variance in node power levelsä no single node is penalizedä difficult to implement

26

Energy Efficiency atEnergy Efficiency atThe Transport LayerThe Transport Layer

l Reported 48-83% power savings in wireless NIC power withadditional delays of .4-3.1s

l Idea #1: data and header reductionn use compression to reduce communication timen communication vs. computation power

l Idea #2: shutdownn selectively choose short periods of time to suspend

communications and shut down the wireless NICn queue data for future delivery during shutdownn decide when to restartn trade-off between power consumption and delay

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 14

27

Why do Shutdown atWhy do Shutdown atThe Transport Layer?The Transport Layer?

l Lots of challengesn a node can only guess when data is destined for it

u no knowledge as in MAC-layer schemesn side effects on other hosts

u sender may have buffer overflowsu sender may waste power trying to communicate with

sleeping noden balancing power savings against delay, and additional

consumption in shutdown and restartuwhen to shutdown and when to restart

l But, one crucial advantage:n opportunity to incorporate application knowledge to

balance power savings and data delay in an end-to-endfashion.

28

Energy Efficient I//O EncodingEnergy Efficient I//O Encoding

l C of system busses is >> C inside chipsn large amount of power goes to I/O interfaces

u 10-15% in uPs, 25-50% in FPGAs, 50-80% in logicn encoding bus data can reduce the power significantly

u but need to handle encoding/decoding cost (power,latency)

l Examples:nGray code and T0 code on address busses

u addresses usually increment sequentially by 1nCompression to remove redundancynBus-Invert Coding

u transmit D or invert(W), whichever results in fewertransitions from the previous transmitted code

u an extra signal indicates polarityuworks better for small N (25% for N=2, 18.2% for N=8,

14.6% for N=16) … use k sub-busses with k polaritybits!

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 15

29

Example: UCLA/RSC Protocol SuiteExample: UCLA/RSC Protocol Suite

l For use in ad hoc sensor networks.l A TDMA-like schedule for nodes with no neighbor awareness

initiallyl Synchronization among neighbors and channel assignment

done togethern Ad hoc network acquisition

u “superframes” contain a number of invitation framesand the rest are for synchronized communication

u connections built from node-to-node, node-to-network, network-to-network sequentially such thatthe fraction of invitation frame decreases

nRouting protocolu build overlapping spanning trees outward from

gateways -- constructed by messages that keep trackof hops they have moved through

u nodes are hierarchically arranged: tree built fromconnections from one tier to only a lower level.

30

OutlineOutline

l Wireless networked system-on-chip design: I & IIè Power management in networked SOCs

n Energy consumption characteristics in networked SOCsn Power metricsn Power management strategies

uCircuit-level strategies <= NOT COVERED HERE.u Architectural and protocol strategiesu Software strategies

ä compiler techniquesuOS strategies

ä energy efficiency via shutdown, voltage scalingn Power management in 802.11 and Bluetooth

l Design tools for networked system-on-chip

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 16

31

MotivationMotivation

l High performance increasingly means high powern general purpose processors reaching 60 Wn energy efficiency of computation (mW/MIPS) increasing

at a slower rate than application needs

l Portable systems are energy constrainedn but increasingly need high performance to meet the

application and communication needs

32

Processor MHz Year SPECint-95 WattsP54VRT (Mobile) 150 1996 4.6 3.8P55VRT (Mobile MMX) 233 1997 7.1 3.9PowerPC 603e 300 1997 7.4 3.5PowerPC 604e 350 1997 14.6 8PowerPC 740 (G3) 300 1998 12.2 3.4PowerPC 750 (G3) 300 1998 14 3.4Mobile Celeron 333 1999 13.1 8.6

Will IC technology alone help?Will IC technology alone help?

l Speed power efficiency has indeed gone upu 10x / 2.5 years for µPs and DSPs in 1990s

ä degraded before 90sä > 100 mW/MIP to < 1 mW/MIP since 1990

u IC processes have provided 10x / 8 years since 1965u rest from power conscious IC design in recent years

l Lower power for a given function & performanceu e.g. 1.6x / year reduction since early 80s for DSPs (source TI)

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 17

33

But...But...

l Help from IC technology will slow downn e.g. circuit voltage reduction have provided big gains

ä used to be 5V, now around 1.5-2Vä expected to plateau

l Big gains from low-power IC design tricks behind usl Strong indications of continued exponential increase in operating

frequency and # of functionsnwe all want color displays, multimedia, wireless comm.,

speech recognition on our PDAs!n increase of 10x / 7 yrs in gates, 10x/9 yrs in frequency

l Need dynamic control of power and performancen focus on techniques under software control

34

Low Power Software: InstructionLow Power Software: InstructionScheduling & Code GenerationScheduling & Code Generation

l High-level operations (e.g. C statement) can be compiled intodifferent instruction sequencesn different instructions & ordering have different power

l One approach: code-optimizers can use power as a metric,just like speed and code size

l More effective: optimizations targeted at power [Tiwar94],e.g.n [Su94]: Cold scheduling attempts to reduce the number

of transitions on the instruction busn [Lee97]: exploits instruction packing, operand swapping,

and circuit state effectsn schedule memory accesses to reduce power

u organize video and DSP data to maximize the higherlevels (lower power) of memory hierarchy

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 18

35

Memory

Cache

CPU ASIC

I/O

Software Power OptimizationsSoftware Power Optimizations

l Code running on CPUnCode optimizations for power

l Code accessing memory objectsn SW optimizations for mem

l Compiler-supported power mgtnDynamic pwr/perf mgt

36

• Physical Definitions

Pavg = Iavg x VccE = Pavg x TT = N x tE = Iavg x Vcc x t

Pavg : Average power Iavg : Average current Vcc : Supply voltage E : Energy consumption T : Time taken N : Number of cycles t : Cycle time

Example:MOV DX, [BX] Power = 1.15 W MOV AX, CX Energy = 8.6 x 10-8 J MOV AX, DX

NOPMOV DX, [BX]NOP Power = 0.99 WNOP Energy = 22.3 x 10-8 JMOV AX, CX - 14% less powerNOP - 158% more energyNOPADD AX, DXNOP

• Energy consumption determines battery life

[Source: Tiwari]

Energy and PowerEnergy and Power

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 19

37

Lexical AnalysisSemantic Analysis

Compiler Flow (Front-End)Compiler Flow (Front-End)

Analysis:

Data dependenceArray, Pointer Loop

Memory/Power:

Loop/Arrayoptimizations

Parallelization

Task-levelLoop-level

Program

High-level IR

High-level IR

Multi-processor/Multi-threading

Memory Subsystem

Power OptOpportunities

assembly

Source code

frontend

backend

IR (CDFG)

38

Lowering: ComplexExpressions, Array Subscripts

Compiler Flow (Middle)Compiler Flow (Middle)

Pre-scheduling optimizations :Dead code removal, InductionVariable Elimination, etc … ..

Memory/Power: Initialmemory assignment, Data-Cache Optimizations, Loopblocking, skewing, etc.

Transformations

High-Level IR

Intermediate IR

Low-level IR

Power OptOpportunities

Optimizations

Tree Ht Reduct.

Strength Reduct.

Spill code opt.

Instr Selection Scheduling

Reg alloc

SW Pipelining

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 20

39

Post-scheduling optimizations:Peephole Optimizations, MachineSpecific Optimizations

Compiler Flow (Back-End)Compiler Flow (Back-End)

Memory/Power:I-Cache OptimizationsFinal memory assignment

Low-Level IR

Code Generation

Object Code

InterProcedural:Register AllocationCall conventionimplementation

Power OptOpportunities

40

n 2nCycles

2 MOVs1 LAB

25.8

33.8

Current (mA)

A

Shift/Add Array

Product

Recoding logic

B

[Source: Tiwari]

Energy-efficient SoftwareEnergy-efficient SoftwareGenerationGeneration

l Compiling for Speed is Goodu Faster Program => Lesser Energy (with Power mgmt

n Dual memory loads; Instruction packing (DSP)u Two on-chip memory banks

ä Dual load vs. two single loadsu 1-cycle Packed vs. 2-cycle unpackedu Almost 50% reduction in energy

n Reorder Instructions to reduce switching effectsu Not much impact on large general purpose CPUsu Useful in DSPs - (~15% benefit) [Lee et. al. TVLSI, Dec ‘96]

n Swapping multiplication operands (DSP)u Put operand with lower weight in B (upto 30% red)

Memory

Cache

CPU ASIC

I/O

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 21

41

Instruction Scheduling andInstruction Scheduling andReorderingReordering

l Power depends on switching activity, units accessedl Power-driven scheduling

n scheduling to reduce pipeline stallsn selecting a minimum-power instruction mix for executing

an applicationn reducing switching on address/data lines

u instruction reorderingä pairs of instructions have different power

consumptionsu operand swapping

n low-power instruction setsn instruction packingn shut down unused units

Memory

Cache

CPU ASIC

I/O

42

Register OptimizationsRegister Optimizations

l Register files size is increasing for newer processorsn power consumption of registers – important component

of the overall powern still power per access less than for cache or memory

u tradeoffs between register file and cache/memoryn special techniques for optimizing register file power

consumptionl More aggressive register allocation techniques

n better utilization of registersl Techniques for minimizing switching activity

Memory

Cache

CPU ASIC

I/O

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 22

43

Register OptimizationsRegister Optimizations

l Software Energy Optimization [Tiwari J. VLSI Signal Proc. Aug ‘96]nReduce Memory Accesses, Make better use of Registers

uData for i486u register access = 300 mA/cycleumemory read (cache hit) = 430 mA/cycleumemory write (write-through cache) = 530 mA/cycle

n can be achieved for e.g. by saving the least amount of contextduring function calls (compiler policies)

n better utilization of registersu optimal register allocation of temporariesu global register allocation for the most used variables

ä use register operands as opposed to memoryoperands

Memory

Cache

CPU ASIC

I/O

44

Memory Hierarchy OptimizationsMemory Hierarchy Optimizations

l Memory – generally the most power hungry subsystem

l Study power consumption across the entire hierarchy

l Optimizations for cachenCode transformation to improve cache hit rates

u reordering of memory accessesu allocation, blocking and copying of data

n Partition memory into cached and direct-mapped(scratch-pad) spaces

Memory

Cache

CPU ASIC

I/O

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 23

45

Memory Hierarchy OptimizationsMemory Hierarchy Optimizations((Cont’dCont’d))

l Exploit data localityn reducing memory accesses by coalescing narrow

memory references into wide onesn data transfer and placement

l Memory subsystemnmemory bank assignment for low energy

l Reduce memory accessesn better utilization of registers

umore aggressive register allocationu allocate local and global variables to registers

Memory

Cache

CPU ASIC

I/O

46

Memory Hierarchy OptimizationsMemory Hierarchy Optimizations

l Influence of Compiler Optimizations on System Power [Kandemir,DAC2000]n linear loop transformations

u loop interchange: improve data localityu some loop transformations may increase the power

consumed in the datapathn loop tiling (blocking)

u decrease power in memory, but may increase it in the coren loop unrolling

u fewer memory accessesu reduction in power consumed in register file and data buses

n loop fusionu reduce memory power consumption

n loop fisionu can increase memory power consumption

n scalar expansionu power increase in core and memory system Memory

Cache

CPU ASIC

I/O

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 24

47

Compiler-Controller Power Compiler-Controller Power MgtMgt

l Understanding power profile is keyn focus on major sources of power consumptionnmanage power

l Possible approachesn reduce average power consumptionnmodulate power consumption

u over timeu over resources

n Examples: Transmeta, UCI COPPER Memory

Cache

CPU ASIC

I/O

48

The COPPER ProjectThe COPPER Project

l Compiler-controlled Power-Performance Managementl Develop efficient architectural support and compiler

techniques for power managementn continuously -- as an application runsn targeted for high performance/VLIW machines

l Coordinated management of multiple techniquesn reduction in power with little or no loss of performance.

l Develop techniques for dynamic compilation to actively tradeoff performance and power consumption

l Develop a retargetable, ADL-based, power-aware systemsimulation capability.

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 25

49

ApproachApproach

l Compiler Strategies for Power ManagementnCompiler-directed architectural “configuration”

u generate “configuration code” embedded in theapplication

u code “adapts” to new architectural organization atruntimeä JIT vs multi-version compilation techniquesä dynamic, on-demand optimization

nCode annotation for dynamic compilationu trade-off compilation overhead for quality of generated

coden Power-use Estimation for Compiler Control

u static analysis to select “optimal” configurationu profile-based selection techniquesu static or dynamic prediction methods

50

COPPER FrameworkCOPPER Framework

Cycle-LevelPerformance

Simulator

ParameterizablePower Models

HardwareConfig

CodeVersions Performance

Estimate

PowerEstimate

Cycle-by-CycleHardware Access

Counts

Power Simulator

PowerScheduler

PowerProfiler

Compiler(gcc)

Application

ChosenCode Version

AvailablePower

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 26

51

Step 1: Profile functions for energy useStep 1: Profile functions for energy use

l Using Code Versions and Annotations

Sourcecode

G C C

A rchitectureconfiguration

PowerEstimator

PowerProfiler

CodeLoader

FunctionAddress

Extractor

Code versions

Code

Code VersionDatabase (CVDB)

Function LevelPower Profile

Database(PPDB)

Phase 1: PPDB generation

A rchitecture & Power Simulator

52

Step 2: Use the profiles in the schedulingStep 2: Use the profiles in the schedulingprocessprocess

P h a s e 2 : S im u l a t i o n

P o w e rE s t i m a t o r

R u n - t im eP o w e r

P r o f i l e r

C o d eL o a d e r

C V D B

R u n - t im eP o w e r P r o f i l e

& S t a t i s t i c sD a t a b a s e

A r c h i t e c t u r e & P o w e r S i m u l a t o r

P P D BB B B

P o w e rS c h e d u l e r

E x t e r n a l P o w e rP r o f i l e D a t a b a s e

( E P D B )

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 27

53

Baseline ArchitectureBaseline Architecture

l A MIPS R10K like processorn 4-wide issue, out-of-order (OOO) processor

u 5-stage pipeline: fetch, dispatch, issue, writeback,commit

n 32b integers, 64b f.p. numbersn register files: 32 integer and 32 FP registersn 32K L1 instruction cache, 32K L1 data cache

u 32B L1 line size,n 512K L2 unified cache

u 64B L2 line sizen 2 int ALUs, 1 FP adder, 1 FP multipliern 512-entry BTB, 2K entry branch predictor

54

Power/Performance “Knobs”Power/Performance “Knobs”ExploredExplored

¶ Memory hierarchy

· Instruction issue logic & issue width for VLIW m/c

¸ Dynamic Register File Reconfiguration

¹ Frequency and Voltage scaling

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 28

55

Dynamic Register ReconfigurationDynamic Register Reconfiguration

l Compiler generates different code versionsnCode versions have different ILP, register need…

l Power-performance profiling compiler decides on best codeversion

l Compiler generates function code annotation n carrying the chosen code versionn carrying the number of registers needed for each code

versionl At function calls, the run-time scheduler selects code version and

adjusts register file size accordinglyn all based on code annotation information

56

Power Scheduling HeuristicPower Scheduling Heuristic

l Select code version dissipating below and closest to the limit

l Switch to selected version and re-configure registers file

l Invoke every N cycles to continuously track energy use

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 29

57

Register file power management without performance degradation

Power Management Through DRRPower Management Through DRR

58

Unconstrained DRRUnconstrained DRR

l Register file power management with performance degradation

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 30

59

Frequency and Voltage ScalingFrequency and Voltage Scaling

l Code profiled for 4 clock frequency/voltage scaling configurations

60

Power Management by F/V ScalingPower Management by F/V Scalingl 4 available versions (600MHz,2.2V-500MHz,2.0V-400MHz,1.8V-300MHz,1.6V)

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 31

61

Timing ConstraintsTiming Constraints

l We consider timing constraints as bounds on operationintervalsn upper and lower boundsn (determination of optimum interval separation possible

statically)l Time constraints specified via checkpoints

nUser-defined checkpoints are inserted in the source codeand time constraints between checkpoints are defined.

62

ConstrainedConstrainedDynamic Frequency & Voltage ScalingDynamic Frequency & Voltage Scaling

l Specify time and energy constraintsn energy constraints specified via estimation of the varying

power available throughout the whole program executionl Power-performance profiling compiler

n estimates max energy/cycle ratio and cycle countbetween checkpoints

l Run-time schedulernCalculates run-time freq limit based on available power

and energy profile between curr chp and all possible nextchps

nCalculates optimal target freq based on both timeconstraints and run-time freq limit between curr chp andall possible next chps.

n Final target freq is selected so that the code runs as slowas possible within the imposed time constraints.

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 32

63

Combining DRR & F/V Scaling:Combining DRR & F/V Scaling:Three PhasesThree Phases

l Profiling phase 1: DRRnUsing code annotations for DRR, run the program

adjusting register file size and collect energy and cyclecount for each function for all code versions

n Select best code version for each function based onprofile and save this info as code annotations.

l Profiling phase 2: Checkpoint verificationnUsing code annotations from phase 1, run the program

adjusting register file size and changing code version tobe run, and collecting energy/cycle and cycle countbetween checkpoints

l Scheduling phase: use code annotations from P1, P2n At function calls, dynamically change code version and

RF sizen At program checkpoints and change points in the

available power profile, dynamically adjust frequency andvoltage (choose operating point on speed-power curve).

64

Register usage during executionRegister usage during execution

0

5

10

15

20

25

30

35

0 50 100 150 200 250 300

[Cod

e V

ersi

on a

nd r

egs]

[Time, ms]

Register File/Code Versioning and Clock Frequency/Voltage Power Management - paraffins

12

34

44

44

44

44

44

44

45

612

34

44

44

44

44

44

44

45

612

34

44

44

44

44

44

44

45

6Reg Num

Code Version

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 33

65

Results (Preliminary)Results (Preliminary)

0

100

200

300

400

500

600

0 50 100 150 200 250 300

[Fre

quen

cy, M

Hz]

[Time, ms]

Register File/Code Versioning and Clock Frequency/Voltage Power Management - paraffins

12

34

44

44

44

44

44

44

45

612

34

44

44

44

44

44

44

45

6Frequency Limit

Frequency

#start end MinTime MaxTime#checkp checkp (ns) (ns)

1 2 4000 40002 3 8000 80003 4 1000 10004 4 6000 300004 5 40000 600005 6 70000 70000

Time Power2000 1.5

27000 1.2454000 0.9881000 0.73

109000 0.28136000 0.72163000 0.95190000 1.15218000 1.38245000 0.365272000 0.85300000 0.95

Freq Freq limit: limit: maxmax. allowed . allowed freq freq using energy constraintsusing energy constraintsTarget frequency chosen based on time and energy constraintsTarget frequency chosen based on time and energy constraints

66

(Explanation)(Explanation)

l Green linenMaximum allowed freq calculated using energy

constraintsl Blue lines

n Program checkpointsl Red line

n Target freq chosen by dynamic scheduler, respectingtime constraints and allowing the program run as slow aspossible to save power

n Freq value =0 means extra delay was inserted to satisfyminimum time constraints between checkpoints in thesimulation

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 34

67

Combined RegisterCombined RegisterReconfiguration, F&V ScalingReconfiguration, F&V Scaling

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 50 100 150 200 250 300

[Pow

er]

[Time, ms]

Register File/Code Versioning and Clock Frequency/Voltage Power Management - paraffins

12

34

44

44

44

44

44

44

45

6Power Consumption

Predicted Power Profile

68

Compiler Control of Power:Compiler Control of Power:SummarySummary

l While average power reduction is important, effective control ofdynamic power consumption is essentialn especially for software management of power and

performancel The hard problem here is

n identification of effective architectural mechanisms and theirdeterministic control through software

l COPPER approachn use architectural features common to a range of processor

architecturesumemory hierarchy, register files, instruction issue.

nCoordinate with technology and OS strategiesu frequency and voltage scaling.

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 35

69

Shutdown for Energy SavingShutdown for Energy Saving

l Shutdown attractive for many wireless applications due tolow duty cycle of many subsystems:

l Issues:n Cost of restarting: latency vs. power trade-off

u increase in latency (response time)u increase in power consumption due to startup

n When to Shutdown: Optimal vs.Idle Time Threshold vs. Predictiven When to Wakeup: Optimal vs. On-demand  vs. Predictiven Two main approaches: (Reactive versus Predictive)

u “Go to Reduced Power Mode after the user has been idle fora few seconds/minutes, and restart on demand”

u “Use computation history to predict whetherTblock[i] is large enough ( Tblock[i] ≥ Tcost )”

Blocked“Off”

Active“On”

Tblock Tactive ideal improvement = 1 + Tblock/Tactive

70

To Shutdown or Reduce Voltage?To Shutdown or Reduce Voltage?

l Observation:n better to lower voltage than to shutdown in case of digital logic

l Example: task with 100ms deadline, requires 50ms CPU time at full speedn normal system gives 50ms computation, 50ms idle/stopped timen half speed/voltage system gives 100ms computation, 0ms idlen same number of CPU cycles but 1/4 energy reduction

l Voltage gets dictated by the tightest (critical) timing constraint both onthroughput and latency --> dynamically change voltage

nUse voltage to control the operating point on the power vs. speedcurveu I.e., power and clock frequency are functions of voltage

nMain challenge here is algorithmic:u one has to schedule the voltage variation as well!

ä via compiler or OS or hardware

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 36

71

Solution: Dynamically Vary VoltageSolution: Dynamically Vary Voltage

Active Idle

Efixed = 1/2 ⋅CVdd2

Tframe TframeFixed Supply

Active

Variable Supply

Evar = 1/2 ⋅C(Vdd /2)2 = 1/4E fixed

0 0.2 0.4 0.6 0.8 1.00

0.2

0.4

0.6

0.8

1.0

Normalized Workload

No

rmal

ized

Pow

er

Fixed Supply

Variable Supply

from [Gutnik96] (VLSI Symposium)

72

Voltage Scheduling inVoltage Scheduling inGeneral-purpose OSsGeneral-purpose OSs

l Approach #1: [Weiser94]u time divided into 10-50 ms intervalsu f & V raised or lowered at the beginning of the interval based on

CPU utilization during the previous intervalå 50% savings for a processor in the range 3.3V-5Vå 70% savings for a processor in the range 2.2V-5V

l Approach #2: [Govil95]u predicts CPU cycles needed in the next intervalu sets f & V accordinglyumany prediction strategies: some did well, others not

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 37

73

Fixed Priority Preemptive CPUFixed Priority Preemptive CPUScheduling inScheduling in RTOSs RTOSs

l Consider task set (period, WCET, deadline)n {(10, 3, 10), (14, 7, 14)}

l CPU utilization = 3/10 + 7/14 = 80%l Obvious power management strategies:

n Shutdown when idleu saves 20% power

nCan we slow CPU by 20% (& reduce V) for more savings?uNO, as deadlines will no longer be met

nHowever, can slow by x 14/13 and lower voltage to stillmeet deadlines, and shutdown during idle timeu saves 22.5% in power

Power Management in 802.11 and BTPower Management in 802.11 and BT

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 38

75

802.11 Review802.11 Review

l CSMA/CA: direct access if medium free for > DIFS, else defer and back-offn SIFS = short interframe spacen PIFS = PCF interframe spacen DIFS = DCF interframe space

l CSMA/CA + ACK: receiver sends ACK immediately if CRC okayn if no ACK, retransmit after a random backoff

76

802.11802.11

l Contention-free Point Coordination Function (PCF) tosupport low-jitter time bounded trafficn optional, resides at the AP

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 39

77

802.11 PCF Mode802.11 PCF Mode

78

Factors Affecting PowerFactors Affecting Power

l Cell sizel physical layer TX ratel protocol overheadl receiver sleep modesl speed of wakeupl efficiency of modulation (Eb/N0 for a given BER)l antenna patternsl acquisition aidesl system frequency uncertaintyl multipath abatement strategies

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 40

79

Typical 802.11 StrategiesTypical 802.11 Strategies

l TX power controln capability to adjust transmitted power at the transmit IF to

maintain a constant output power at the antenna portu transmit power reading from TX PA to the BB

processoru allows TX PA to be driven close to its compression

point without concern for overdriving it due tomanufacturing variations (else backed off at least by4 dB)

l single SAW filter for both TX and RXl single oscillator for reference for RF/IF synthesizers, carrier

timing and MAC clockn carrier and symbol timings are locks in new 802.11

l half duplex radio with unused portions turned offl sleep modes for fast recoveryl low-power acquisition mode

80

Power StatesPower States

l NIC Power downn all functions are powered down, no powern recovery time includes starting and initializing MAC,

loading the initial values, starting the radio, synthesizerloading and acquisition, system slot time acquisition,channel scanning, beacon acquisition, systemauthentication

l Radio Deep Sleepn saves: initialization registers, beacon timing, system slot

timing, last good channel, system authenticationn requires MAC to be running while the radio is turned offn only synthesizers must require signal on power up

l NIC Sleep modenMAC clocked with a low rate clock (32-KHz watch crystal)n saves most things as above except that slot timing is lost

l Radio Receive: all TX functions turned offl Radio Transmit: all RX functions turned off

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 41

81

Power Saving in 802.11Power Saving in 802.11

l Mobile nodes in power saving (PS) mode switch off theirradios for some periodn sender nodes meanwhile buffer the framesn TX in a WLAN is active less than 2% of the timenMost of battery power is used by PHY RX circuitry

u entire PHY can be turned off when no transfer istaking place even if the node (station) is active

n Latency increase due to PS can be controlled by reducingtimeouts in high layer protocols

l Nodes are synchronized to wake up at the same time whenthe sender announces buffered framesn nodes with frames for them in the announcement stay up

until frame is deliveredn timing synchronization function (TSF)

l Easy to do in PCF, but hard to do in DCFl In PCF, the basic service set (a set of nodes on a logical

network) can reduce consumption by 97.5% when cyclingover one minute intervals over a continuously on system.

82

Power Saving in the PCF ModePower Saving in the PCF Mode

l AP generates time-stamped beacons and transmits themevery beacon interval (~100 ms)n beacon transmission is deferred if channel busyn nodes wake up before the end of beacon interval and stay

up until beacon is receivedn nodes adjust their local timers to the timestamp

l Beacon carries a traffic-indication map (TIM)n all unicast packets for nodes in sleep mode at announced

in the TIMnmobile nodes with entries in TIM request packets from

APl Broadcast packets are announced by a delivery TIM (DTIM)

and send immediately afterwards

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 42

83

Power Saving in the DCF ModePower Saving in the DCF Mode

l Timers are adjusted in a distributed fashionn every node generates beaconsn all nodes compete to transfer the beacon using DCFn the first node to transmit the beacon is the winnern other nodes cancel their beacons & adjust timers

l Packets for sleeping nodes are buffered by the sender untilthe end of beacon intervaln announced using ad hoc TIMs (ATIMs) sent via DCF

u transmitted in an ATIM window (~ 4 ms) after thebeacon

u ack’ed by the receiveru receiver stays up and waits for the packet

84

Power Saving in the DCF ModePower Saving in the DCF Mode(contd.)(contd.)

from [Woesner98]

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 43

85

Power Saving in the DCF ModePower Saving in the DCF Mode(contd.)(contd.)

from [Woesner98]

86

Power Saving in MAC viaPower Saving in MAC viaDirectories or SchedulesDirectories or Schedules

l Scheduling access via a schedule or a directoryu 802.11’s TIM is like a directoryu this concept also used in pagersu also suggested in the literature for application level

l Several TDMA-based MAC protocols around this ideau frame with directory or schedule carrying beacon in

the first slotumobile nodes wake up only in the right slots

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 44

87

EC-MAC: Energy Conserving MACEC-MAC: Energy Conserving MAC[Sivalingam97][Sivalingam97]

88

Energy Trade-offs at MAC LayerEnergy Trade-offs at MAC Layer

l Careful control of access to minimize corruption due tointerference

l Power management to minimize stand-by powern circuit techniques can helpn even better: a wake-up channel

u currently radios cannot know if and when anotherradio wants to talks to itä periodic scheduled wake-up

uwake-up signal using low power or passivecomponents?

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 45

89

Optimizing Optimizing Bluetooth Bluetooth ModulesModules

l Power, size and cost tradeoffs for a range of applicationsl A typical module includes:

n Antennan Power amplifiernRF section (2.4GHz ISM)nBaseband section (Link controller and manager)

n Integration and optimization:u combine host and bluetooth processingu combine RF section and BB processingu integrate or eliminate memory systemu eliminate power amplifier, PLL/VCO integration

Host Controller

Link Contrl BMC

Peripherals Memory

RX Path LNA

PLL Power Mangmt.

TX Path Power Amp.

90

Embedded SW in Embedded SW in BluetoothBluetooth

l Embedded BT software upto link-layer host controllerinterface (HCI)

l Rest of the protocol stack can be optimized for size, powerl Baseband processing consumes about 15% of power

n can be optimized further through sleep/shutdown.

HID

TCP/IP

Audio

HCI

Application

L2CAP

RFCOMM

SDP

HostHCI

LinkManager

BB Proc.

RF

RFCOMM

Audio

Bluetooth

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 46

91

BT Provisions for Low PowerBT Provisions for Low Power

l Robust hopping mechanismnmaster and slave remain synchronized even if no packets

are exchanged for hundreds of ms (I.e., no dummy dataexchange is needed)

l “Page mode” operationn receiver can quickly detect if a pakcet is present or not

through a sliding correlator for an access code that lastsabout 70 us after a scan of 100 us for jitter and drift.

l A master can put a slave in HOLD, PARK or SNIFF modesn control operation duty cycle while minimizing need for

synchronizationl Duplexing through time division

n no need for separate TX and RX oscillatorsn no need for a duplex filtern no cross-talk from TX into the RX path.

92

SummarySummary

l Networked SOC applications present a very wide range ofsystem optimization opportunities for power, size andperformancen enables package boundaries specific to application

needsn a much tighter coupling of power reduction strategies

across hardware and software is possible due to SOCintegration

l Power management entails control of the power profilenwhen balanced against latency, it can be “modulated”

based on power source needsl Effective power management for networked SOCs must be

coordinated across the partitioning of hardware, softwareand layersn to ensure functionality delivery within performance

constraints.

Powr Management for Wireless SOCs©2001 R. Gupta, ASP-DAC’01 47

93

ReferencesReferences

l IEEE Proceedings, Special issue on Low Power RF Systems,October 2000

l Low Power Design in Deep Submicron Electronics, ed: Nebel,Mermet, Kluwer 1996 (Notes from NATO ASI)

l Dynamic Power Management, Benini et al, Kluwer 1999l Bluetooth Revealed, Miller and Bisdikian, Prentice-Hall, 2000.