lowpowervlsicircuitsandsystems -...

49
Es#ma#on and Reduc#on of Power Consump#on 4/21/14 Olivier Sentieys 1 1 LowPower VLSI Circuits and Systems Olivier Sen#eys ENSSAT Université de Rennes 1 IRISA/INRIA sen#[email protected] Équipeprojet CAIRN hLp://www.irisa.fr/cairn 2 Power es#ma#on and reduc#on 1. Why care about power? Heat dissipa#on Limited energy in portable systems Wa# is the problem? 2. Where does power go in CMOS chips? Digital integrated circuits Microprocessors, DSPs, ... 3. How to es#mate power? 4. How to reduce power? Hardware and SoYware 5. Conclusions

Upload: others

Post on 27-Apr-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 1

1  

Low-­‐Power  VLSI  Circuits  and  Systems  Olivier  Sen#eys  ENSSAT  -­‐  Université  de  Rennes  1  IRISA/INRIA          sen#[email protected]    

Équipe-­‐projet  CAIRN  

hLp://www.irisa.fr/cairn    

2  

Power  es#ma#on  and  reduc#on  

1.  Why  care  about  power?  – Heat  dissipa#on  –  Limited  energy  in  portable  systems  – Wa#  is  the  problem?    

2.  Where  does  power  go  in  CMOS  chips?  – Digital  integrated  circuits  – Microprocessors,  DSPs,  ...  

3.  How  to  es#mate  power?    4.  How  to  reduce  power?    

– Hardware  and  SoYware  5.  Conclusions  

Page 2: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 2

3  

Technological evolutions •  DEC/Compaq processor family [Herrick99] •  EV4

–  200 MHz @ 3.3V –  16 gate delays per cycle –  30W @ 200 MHz & 3.3V –  1,7 Million Transistors –  233 mm2

•  EV7 (21364) –  > 1000 MHz @ 1.5V –  100W –  ~100 Million Transistors –  ~350 mm2

•  Intel Pentium 4 [Intel 2000] –  1.5 GHz, 0.18 micron –  Power reduction of P4 using clock gating and power gating of unused blocks –  Thermal sensors were embedded on the chip to cut the CPU in case of overheating! –  55 Watts at 1.5 GHz (instead of 90 Watts)

•  EV5 (21164) –  350 MHz @ 3.3V –  14 gate delays per cycle –  60W @ 350 MHz & 3.3V –  9,3 Million Transistors –  298 mm2

•  EV8 (never fabricated…) –  > 1-2 GHz (0.125 micron) –  <150W –  ~250 Million Transistors

•  EV6 (21264) –  575 MHz @ 2.2V –  12 gate delays per cycle –  90W @ 575 MHz & 2.2V –  15,2 Million Transistors –  314 mm2

4  

1.  Heat  Dissipa#on  

•  Undesirable  effects  –  Decrease  in  performance  and  reliability  

•  MTBF/2  every  +10°C  

–  Increase  in  cost  (cooling)  •  1€/W    when    >40W  

–  Increase  in  volume  and  weight  

•  heat-­‐sink,  fan,  baLeries,  …  

•  Will  technological  evolu#ons  solve  the  problem?  

Page 3: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 3

5  

Technological evolutions

Year

Volta

ge [V

]

Pow

er p

er c

hip

[W]

VDD

cur

rent

[A]

1998 2002 2006 2010 2014 0

0.5

1

1.5

2

2.5

0 0

200 500

Current

Power

Voltage

6  

Technological evolutions

•  Projections –  ... 2000 Watts, 3000 A ! – Chip area or Transistor count or Frequency must be

kept constant to stay below limits of 100-200W and 300-500A

1

10

100

1000

10000

1985 1990 1995 2000 2005 2010

Pow

er (W

atts

)

Vdd scaling

0.1

1

10

100

1000

10000

1985 1990 1995 2000 2005 2010

Icc

(A)

386 486

Pentium Pro

PII PIV

Power Supply current

Page 4: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 4

7  

2.  Portability  (1)  

•  Mul#media  –  Audio/Video  encoding  –  Audio/Video  decoding    

•  Interfaces  –  Voice  recogni#on  –  Iner#al  pen,  touch  screen  

•  Enyryp#on  •  Mobility  

–  LTE,  UMTS,  EDGE,  GSM  –  Internet  Protocol  – WiFi  

Tx. Radio

Rx. Radio

Graphics

Video

Voice

Interface

8  

Battery technolgies

•  Battery performance

[P. Senn 2000]

250

200

150

100

50

0 100 200 300 400

Smaller

Ligh

ter

Whr/l

Whr/kg

NiCd

NiMh

Lithium-Ion Liquid Lithium-Ion Polymer

LTC Lithium-Ion Polymer

LTC Lithium-Alloy Polymer

Technologies NiCd NiMh Li-ion Li-poly Tear of production

1956 1990 1992 1996

Voltage (V) 1.2 1.2 3.6 3.7 Thickness (mm)

>6 >6 >6 3

Capacity (Whr/kg)

30-50 60-90 70-140 115-140

Lifetime (cycles)

~1000 ~1000 500 500

•  Typical example –  500 mAh Li-Pol = 1,7 Wh

•  High-capacity example –  1400 mAh Li-Pol = 5 Wh

•  For 10-hour autonomy –  P < 400 mW

Page 5: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 5

9  

3G  Terminal    

•  Processing    –  digital  baseband,  video,  graphics,  etc.  

–  >10  GOPS  (Giga  Oper.  Per  Sec.)  •  BaLery  life:  10h  •  Weight:  100g  (baLeries)  

Tx. Radio

Rx. Radio

Graphics

Video

Voice

Interface

500mW @ 6 GIPS 12 GIPS/W @ 6 GIPS

•  With current processors –  30 Kg or 10 minutes !!! –  ... with 10s of DSPs !!!

•  Dedicated System-on-Chip

10  

Conclusion  (power)  

•  Technology  evolu#on    –  Increase  in  transistor  density  –  Increase  in  clock  frequency  

•  Power  density  of  ICs  is  s#ll  increasing  despite:  –  Supply  voltage  decrease  –  BeLer  design  methods  

•  Limita#ons?  –  Heat  dissipa#on  limits  

•  100W/cm2  is  a  hard  limit…  –  Limita#ons  due  to  applica#ons  

•  Portable  computers,  smartphones  •  Embedded  systems  (e.g.  drones,  satellite)  •  Ultra-­‐low  power  systems  (e.g.  sensor  networks)  •  Data-­‐centers,  telecommunica#on  base-­‐sta#ons,  Internet  routers,  etc.  

10

Page 6: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 6

11  

Conclusion  (energy)  

•  Limited  baLery  evolu#on    – 10  -­‐  15%  per  year  

•  Evolu#on  of  power/energy  of  integrated  circuits  – 35  -­‐  40%  per  year  

•  Consequence  – There  exists  an  important  gap  important  between  baLery  technologies  and  current  energy  efficiency  of  electronics  chips  

12  

Power  es#ma#on  and  reduc#on  

1.  Why  care  about  power?  – Heat  dissipa#on  –  Limited  energy  in  portable  systems  – Wa#  is  the  problem?    

2.  Where  does  power  go  in  CMOS  chips?  –  Recap  – Microprocessors,  DSPs,  ...  

3.  How  to  es#mate  power?    4.  How  to  reduce  power?    

– Hardware  and  SoYware  5.  Conclusions  

Page 7: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 7

13  

Recap:  Metrics  

•  Delay  (sec):  –  Performance  metric  

•  Energy  (Joule)  –  Efficiency  metric:  effort  to  perform  a  task  

•  Power  (WaL)  –  Energy  consumed  per  unit  #me  

•  Power*Delay  (Joule)  –  Mostly  a  technology  parameter  –  measures  the  efficiency  of  performing  an  opera#on  in  a  given  technology  

•  Energy*Delay  =  Power*Delay2  (Joule-­‐sec)  –  Combined  performance  and  energy  metric  –  figure  of  merit  of  design  style  

•  Other  Metrics:  Energy-­‐Delayn  (Joule-­‐secn)  –  Increased  weight  on  performance  over  energy  

14  

Recap:  Power  Equa#ons  in  CMOS

P  =  Pdyn  +  Psc  +  Ps    

•  Dynamic  power:  Pdyn  –  Charge  and  discharge  of                                        

circuit  capacitance  

 •  Short  circuit  power:  Psc  

–  Short  circuit  path  in  sta#c  logic  cells  (Vdd  è  Vss)  during  commuta#on    –  Strongly  depends  on  rising  #me  and  on  Vth  (NMOS/PMOS)  

•  Sta#c  power:  Ps  –  Sub-­‐threshold  leakage  current  –  Source/Drain-­‐Bulk  junc#on  leakage  (diodes)  

Pdyn = α • Cl • Vdd2 • f

Page 8: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 8

15  

P  =  α  f  CL  VDD2  +  VDD  Ipeak  (P0→1    +  P1→0  )    +    VDD  Ileak  

 

Dynamic power (≈ 40-70% today and decreasing

relatively)

Short-circuit power (≈ 10% today and

decreasing absolutely)

Leakage power (≈ 20-50 % today and increasing)

Recap:  Power  Equa#ons  in  CMOS  

powerstaticrateoperationenergyP +×=

16  

Recap:  Ac#vity  

•  Probability  propaga#on  

A  B  C   S  

X  

P(A)  =  ½  P(B)  =  ½  P(C)  =  ½    

P(X  =1)  =  1/4  P(S  =  1)  =  1/2  .  3/4  =    3/8    

αx  =  P(X=0)  .  P(X=1)                =  (1-­‐P(X=1))  .  P(X=1)                =  (1  –  1/4)  .  1/4                =  3/16  

αs  =  P(S=0)  .  P(S=1)              =  (1  –  P(S=1))  .  P(S=1)              =  (1  –  3/8)  .  3/8  =  5/8  .3/8              =  15/64  

Page 9: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 9

17  

Recap:  Glitches  

•  Glitch    – Dynamic  hazards  – Useless  behaviour  –  Important  useless  power  

A B C S

X

ABC X S

101 000

18  

Where  is  power  dissipated  in  CMOS  chips?

Operators? Clock? Logic? Memory? RF? LCD, HDD, etc. ?

Page 10: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 10

19  

Where is power dissipated?

•  Internal  Memory  – Cache  – Scratch  pad  

•  External  Memory  – DRAM,  Flash  –  Include  power  consumed  by  I/O  pads  

•  Ex.  Itanium  2  – 60-­‐70%  of  area  is  due  to  caches    

20  

Where is power dissipated?

•  Clock – 40-50% of dissipated power is due to clock tree and

clock drivers

DIGITAL  Corp.  Alpha  21164  processor  1995  2.5M  Portes  9.3M  Transistors  298  mm2  300  MHz  64/128  Bits  0.5µ  60W  @  3.3V    

Clock  Gen

erator  

Clock  Driver  

Clock  Driver  

Page 11: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 11

21  

Where is energy consumed?

•  H263  Image  Coding  – 1500K  opera#ons  and  500K  memory  transfers  – Energy(mem.  transfer)  ~  33x  Energy(opera#on)  – Energy  due  to  memory  ~  10x  Energy  due  to  processing  

Add   Mult   RAM  Read  

RAM  Write  

I/O   Memory  Transfer  (Off-­‐Chip)  

Energy  /  op

era#

on  

22  

Power  Breakdown  

•  Portable  Computer  – Total  power  (Word  applica#on):  19.1W  

[Source Hitachi]

37%

20%3%10%

30% CPULCDVideoHDDLogic

Page 12: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 12

23  

Power  Breakdown

•  GSM Terminal

Paging  Mode  80%  of  #me  

Speaking  Mode  20%  of  #me  

Total  

40%  

0%  

60%  

15%  

50%  

35%  

20%  

40%  

40%  

Radio  

Power  Ampli    

Base  Band  Codec  

 

[Source Philips]

24  

Power  es#ma#on  and  reduc#on  

1.  Why  care  about  power?  2.  Where  does  power  go  in  CMOS  chips?  3.  How  to  es#mate  power?  

– Ac#vity  – CAD  tools  

4.  How  to  reduce  power?    – Hardware  and  SoYware  

5.  Conclusions  

Page 13: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 13

25  

Activity

•  Flip-flop power: 9.5µW/MHz for each 0-to-1 output transition

CLK  

Q0      Q0   Q1    Q1   Q2    Q2   Q3    Q3   Q4    Q4   Q5    Q5   Q6    Q6   Q7    Q7  

D0   D1   D2   D3   D4   D5   D6   D7  

26  

Example:  Registers    

•  Sta#s#cal  approach  –  Random  signal  at  inputs  to  es#mate  power  –  4  flip-­‐flop  are  commu#ng  in  average  with  2  from  0  to  1  –  Power:  2x9.5  =  19µW/MHz  

Q0      Q0   Q1    Q1   Q2    Q2   Q3    Q3   Q4    Q4   Q5    Q5   Q6    Q6   Q7    Q7  

D0   D1   D2   D3   D4   D5   D6   D7  

CLK  

White  Noise  

Page 14: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 14

27  

Example:  State  Register  

•  Probabilis#c  approach  – Transi#on  probability  depends  on  signal  ac#vity  

•  Example:  Binary  Coding  of  a  State  Register  – 1  +  1/2  +  1/4  +  1/8  ...  =  2  – Power:  2x9.5/2  =  9.5µW/MHz  

Q0      Q0   Q1    Q1   Q2    Q2   Q3    Q3   Q4    Q4   Q5    Q5   Q6    Q6   Q7    Q7  

D0   D1   D2   D3   D4   D5   D6   D7  

Binary  Coding  of  States  

28  

Example:  State  Register  

•  Probabilis#c  approach  – Transi#on  probability  depends  on  signal  ac#vity  

•  Example:  Gray  Coding  of  a  State  Register  – 1/2  +  1/4  +  1/8  ...  =  1  – Power:  9.5/2  =  4.75µW/MHz  

Q0      Q0   Q1    Q1   Q2    Q2   Q3    Q3   Q4    Q4   Q5    Q5   Q6    Q6   Q7    Q7  

D0   D1   D2   D3   D4   D5   D6   D7  

Gray  Coding  of  States  

Page 15: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 15

29  

CAD Tools

SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Accuracy of Power Estimation

Pote

ntia

l for

Pow

er O

ptim

isat

ion Algorithmic

Architecture

RTL

Gate

Switch

20%

x

5-x1

0 50

%

10%

Research Research

Synopsys/DesignPower Sente/WattWatcher Architect HLDS Cadence/Top-Down Design Planner

Epic/AMPS

Synopsys/DesignPower/ PrimePower Veritools/Power_tool Sente/WattWatcher Gate Xpower

Synopsys/Power Compiler

Spee

d of

Pow

er O

ptim

isat

ion

Research

30  

Transistor-­‐Level  CAD  Tools  

•  Accurate  es#ma#on  –  SPICE  (and  variant)  –  PowerMill  (Epic/Synopsys),  ADM  (Avant!),  LSIM  Power  (Mentor)  

•  Op#misa#on  tools  –  Op#misa#on  by  transistor  sizing:  AMPS  (Epic/Synopsys)  

–  Reliabilty:    RailMill,  Thunder&Lightning  •  Advantages  

–  Highest  accuracy  –  Easy  to  perform  

•  Limita#ons  –  Long  simula#on  #me  –  Limited  to  small  blocks  (100-­‐10k  transistors)  

Device under test

In

Cl

Vdd

Vss

+ - Vs = 0 Is ßIs Ry Cy Po

wer

Page 16: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 16

31  

Logic-­‐Level  Es#ma#on  

•  Two  techniques  – Sta#s#cal  Es#ma#on    

   

– Probabilis#c  Es#ma#on    

Gate-­‐level  Simula#on  

Inpu

t  S#m

uli  

Input  Ac#vity  

Node  Ac#vity    Monitoring  

Ac#vity  Propaga#on  

Average  

Analysis   Power  Rep

ort  

Quality of testbench is crucial!

32  

Two  delay  models  

•  Zero-­‐delay  model  

•  Real-­‐delay  model  

Glitches  /  Hazards  

•  Typically 20% of power is due to glitches •  Up to 70% in arithmetic operators (e.g.

adders, multipliers)

Page 17: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 17

33  

Logic-­‐Level  CAD  Tools  

•  Numerous  examples  –  PrimePower  (Epic/Synopsys),  DesignPower  (Synopsys)  –  QuickPower  (M.  G.),  WaLWatcher/Gate  (Sente)  –  PowerCompiler  (Synopsys):  gate-­‐level  op#misa#on    

•  Advantages  –  Faster  than  transistor-­‐level  tools  –  Rely  on  exis#ng  logic  simulators  (e.g.  ModelSim)  –  Probabilis#c  es#ma#on  for  early  and  quick  es#ma#on  

•  Limita#ons  –  Interconnec#on  (wire)  models  –  Glitch  es#ma#on  is  limited  by  simulator  precision  –  Speed  and  block  size  is  s#ll  limited  (full  chip  es#ma#on  is  not  possible)  

34  

DesignPower (Synopsys)

•  Gate-level analysis •  Estimation of dynamic power (switching power,

internal cell power) and static (leakage) power •  Probabilistic or statistical

Gate-Level Netlist

Switching Informations

Estimation

Probabilistic Analysis

Simulation Analysis

•  Default or values probabilities

•  Fast

•  Full-timing gate-level simulation

•  Time consuming •  Accurate

Page 18: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 18

35  

Pow

er A

naly

sis

DesignPower (Synopsys)

•  Design Flow – Power estimation – Power constraints

during logic synthesis

HDL Design

Optimization Timing, Area, Power

Optimization Timing and Area

Gate-Level Simulation

Physical Design

36  

Power  Models  in  DesignPower  

•  Switching  Power  •  Internal  Cell  Power  

–  e.g.  gate  with  2  inputs  A,B  and  1  output  Z  

 

•  Total  Dynamic  Power  

•  Leakage  Power  

( )∑∀

×=)(

2

TR2 inets

iloadc iC

VddP inetofratetoggle

inetofloadC

i

loadi

:TR

:

( )

=

==

××=

BAii

BAiii

trans

transloadZZcellrnalinte

TransWeightAvg

WeightAvgCfPZ

,

,

TR

.TR

TR

ZoutputfortimetransitionaverageweightedWeightAvg

ipinofratetoggleipinoftimetransitionTrans

trans

i

i

::TR:

∑∀

=)(icells

leakagecellleakage iPP

+= ∑∀ )(icells

cellnalinterdynamic iPP ( )∑

×)(

2

TR2 inets

iloadiC

Vdd

Page 19: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 19

37  

Power  es#ma#on  and  reduc#on  

1.  Why  care  about  power?  2.  Where  does  power  go  in  CMOS  chips?  3.  How  to  es#mate  power?  4.  How  to  reduce  power?    

– Design  flow  and  principles  – Architecture-­‐level  op#misa#on  – SoYware  es#ma#on  and  op#miza#on  – System-­‐level  op#misa#on  

5.  Conclusions  

38  

How to reduce power?

•  Reduce (as low as possible) Vdd

•  Minimize effective capacitance Ceff = α Cl

•  Trade-off performance against power by playing with clock frequency f

•  And do not forget leakage!

Well…  just  need    to  reduce  α,  Cl,  Vdd  and  f  !  

How?  

Pdyn = α . Cl . Vdd2 . f

Page 20: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 20

39  

Where?

•  At each abstraction level

SUM :=

A1+B1

40  

Reducing Vdd

•  Vdd has a quadratic effect on power •  Propagation delay increases if Vdd is reduced

–  but power delay product still increases

0

1

2

3

4

5

6

0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4

Supply voltage (VDD)

Rel

ativ

e D

elay

t d

0

2

4

6

8

10

Rel

ativ

e P

dyn

Delay (td) and dynamic power (Pdyn) are functions of VDD

Pdyn = ↵.CL.V dd2.f

td =CL.V dd

Ids

=CL.V dd

kWL (V dd� V t)2

td / 1

V dd<2

Page 21: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 21

41  

Power  Dissipa#on  and  Circuit  Delay  

1 2

3 4

-0. 4 0 0.4 0.8

0 0.2 0.4 0.6 0.8

1 x 10 -4

V th (V) V DD (V)

Pow

er (W

)

A

B

1 2

3 4

-0.4 0 0.4 0.8

0 1 2 3 4 5

x 10 -10

Del

ay (s

)

V th (V) V DD (V)

A B

[Sakurai03]

42  

Mul#ple  Vdd  

•  Main  idea  –  Use  of  different  supply  voltages  within  the  same  design  –  High  Vdd  for  cri#cal  parts  (high  performance  needed)  –  Low  Vdd  for  non-­‐cri#cal  parts  (only  low  performance  demands)  

•  Usually  two  different  VDD  (but  more  are  possible)  •  Need  for  Level  converters  

–  Necessary,  when  module  at  lower  supply  drives  gate  at  higher  supply  (step-­‐up)  

–  If  gate  supplied  with  VddL                                                drives  a  gate  supplied  with                    VddH  then  PMOS  never  turns  off  

VDDH

Vin Vout VDDL

Level Shifter

Page 22: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 22

43  

FF

FF

FF

FF

FF

FF

FF

FF

FF

CLK CLK CLK

Data  Paths  

•  Data  propagate  through  different  data  paths  between  registers  (flipflops  -­‐  FF)  

•  Paths  mostly  differ  in  propaga#on  delay  #mes  •  Frequency  of  clock  signal  (CLK)  depends  on  path  with  longest  delay  è  cri#cal  path      

Paths Path

44  

Data  Paths:  Slack  

B

A

Y

C

time

all Inputs of G1 arrived

G1 ready with evaluation

delay of G1

all inputs of G2 arrived

Slack for G1

BA Y

C

G1 G2

Page 23: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 23

45  

Mul#ple  VDD  in  Data  Paths  

•  Minimum  energy  consump#on  when  all  logic  paths  are  cri#cal  (same  delay)  

•  Possible  algorithm:  clustered  voltage-­‐scaling  –  Each  path  starts  with  VddH  and  switches  to  VddL  when  slack  is  available  

–  Level  conversion  in  flip-­‐flops  at  end  of  paths  

Connected with VDDL

Connected with VDDH

46  

Reducing Vdd

•  Compensate for Vdd reduction, which decreases performance, by architectural optimizations – Example: 16-bit architecture of a Viterbi decoder

Tclk

A

Tclk

B

+> <

Tclk

C

Pref    =  Cref    .  Vref2  .  Fref    =  14.7  mW  

Cref    =  α . Σ  Ci  Fref    =  1/40ns  =  25MHz    Vref  =  5V  

Area = 0.44 mm2

Page 24: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 24

47  

Parallel Architecture

Pparallel    =  (2.15Cref)  •  (0.58Vref)2  •  (0.5Fref  )                  #  0.36  Pref  =  5.3  mW  

Cparallel    =  2.15  Cref  Fparallel    =  1/80ns      Vdd  parallel  =  2.9V  

2•Tclk

A

2•Tclk

B

+> <

2•Tclk

C

2•Tclk

A

2•Tclk

B+

> <

2•Tclk

C

MUX

Tclk

Area = 0.87 mm2

48  

Pipelined Architecture

•  Pipelined/Parallel  Architecture  – Vdd  =  2V,  2.2Cref,  P=0.2Pref  – Divide  power  by  5  at  the  cost  of  doubling  area  

Ppipeline    =  (1.15•Cref)  •  (0.58•Vref)2  •  Fref              #  0.39  •  Pref  =  5.7  mW  

 Cpipeline    =  1.15•  Cref    (area  advantage)  Fpipeline    =  Fref      Vdd  pipeline  =  2.9V  

Tclk

A

Tclk

B

+> <

Tclk

C

Tclk

Tclk

Page 25: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 25

49  

Summary:  Approximate  Trend  

N-parallel proc. N-stage pipeline proc.

Capacitance N.Cref Cref

Voltage ≈Vref/N ≈Vref/N

Frequency fref/N fref

Dynamic Power CrefVref2fref/N2 CrefVref

2fref/N2

Chip area N times 10-20% increase

50  

•  D Flip-Flop

Gated Clock

CLK CLK

CLK

PFF

= ↵Platches

+ Pclock

CLK

CLK

CLK

CLK

D

Q

Q

Page 26: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 26

51  

Gated  Clock  

•  Remove  useless  commuta#ons  when  register  value  does  not  change  

 

•  State-­‐machine  modifica#on    •  Gain  could  be  high  

– depends  upon  ac#vity  •  Careful  design…  (not  fully  synchronous)  

Reg

Clk

FSM En

Din Reg

Clk

FSM En

Din D Q

Gated Cell

Gated clock

latch

Clk

Gate signal

52  

Conditional Flip-Flop

CLK CKI CKIB

D CKI

CKIB

Q

D Q CLK Controller

CLK

D

Q

CKI

D Q

5 10 15 0

Time (ns)

Clock-on-Demand (COD) F/F [Hamada99]

n  Clock is provided to F/F only when new data comes

Page 27: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 27

53  

Leakage Power Optimization: Power  Ga#ng  •  Objec#ve  

–  Reduce  leakage  currents  by  inser#ng  a  switch  transistor  (usually  high  Vth)  into  the  logic  stack  (usually  low  Vth)  

•  Switch  transistors  change  the  bias  points  (VSB)  of  transistors  

•  Most  effec#ve  for  systems  with  standby  opera#onal  modes  –  1  to  3  orders  of  magnitude  leakage  reduc#on  possible  –  But  switches  add  many  complica#ons  

Virtual Ground

sleep

Vdd

Logic Cell

Switch Cell

Vdd

Logic Cell

Virtual Vdd sleep Switch

Cell

Vdd

Logic Cell

54  

•  Memory  is  a  great  source  of  leakage  •  Switch  off  memory  banks  when  they  are  unused  

Memory 1

Vdd

Gnd

Memory 2 Memory N

Gnd Gnd

Leakage Power Optimization: Power  Ga#ng  

Page 28: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 28

55  

Leakage  Power  Op#miza#on:    Mul#-­‐Vth  

•  Trade-­‐off  Posi#ve  Slack  for  Reduced  Leakage  Power  – Objec#ve:    reduce  leakage  power  where  speed  is  not  needed  – Op#miza#on  performed  post-­‐route    –  Cells  along  paths  with  posi#ve  slack  replaced  with  High-­‐Vth  cells  

•  Leakage  currents  reduced  where  #ming  margins  permits  •  Re-­‐route  not  required  –  new  cells  have  same  footprint  as  previous  cells  

L L

L

L

L

L

L L

H

L

L L

L L

L

H

High  speed,  high  leakage   Reduced  speed,  low  leakage  

56  

Operator Isolation

•  Activate FUs only when necessary

Clk  

FSM  En   Gated  

Cell  

ADD  

MUL  

Instr.  Reg.  

Reg  

 The  mul#plier  consumes  energy  even  is  unsed  

Clk  

FSM  En   Gated  

Cell  

ADD  

MUL  

Reg  

 Mul#plier  inputs  are  latched  when  unused  

Instr.  Reg.  

Latch  

Page 29: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 29

57  

Pre-computation (1)

•  Principle: avoid use of power-hungry blocks when results can be pre-computed by a less-hungry one

Comput. Block

Pre-comp

A

B S

58  

Pre-­‐computa#on  (2)  

•  Example:  comparator  A  >  B  

•  Si  les  2  MSB  sont  différents  alors  le  résultat  peut  être  déterminé  sans  soustraire  A  et  B  :  – Si  A[MSB]  !=  B[MSB]  &&  A[MSB]  ==  0  alors  A  >  B  est  vraie  (1)  

– Si  A[MSB]  !=  B[MSB]  &&  A[MSB]  ==  1  alors  A  >  B  est  faux  (0)  

Reg

Clk

D Q A>B Reg D Q

Clk

Reg D Q

Clk

A

B

Page 30: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 30

59  

Pre-computation (3)

•  Example: comparator A > B –  If the 2 MSBs are equal, then subtraction is needed – Otherwise, result is MSB of B

Gated  Cell  

Clk  

D   Q  A>B    Reg  D   Q  

Clk  

A[MSB]  B[MSB]   D   Q  

D   Q  

Clk  

B[MSB]  

1  0  

A  

 Reg  D   Q  B  

=1  if  A[M

SB]  =

=  B[MSB

]  

What is the average gain?

60  

Post-computing (1)

•  Principle: do not load state register if next state is identical to current state

f  

Reg  

D   Q  

Current  State  

Gated  Cell  

Clk  

=  ?  

Next  State  

Page 31: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 31

61  

Post-computing (2)

•  Example

f  

Reg  

D   Q  

Clk  

Current  State  

E0  

E1  A  

E3  E2  

!A&!B   !A&B  

Gated  Cell  

Clk  

D   Q  f  

Post  A  B  

62  

State Coding

•  Binary coding: higher activity, lower capacitance (area)

•  Gray coding: lower activity, higher area

•  State encoding depending on transition probability –  If Prob(Ck) high then codes of E1 and Ek

should be coded with low Hamming distance codes

E1  

Ci   Ck  Cj  

Ci+Cj+Ck=1  

Ek  

Page 32: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 32

63  

Glitch  Power  Reduc#on  

•  Design  a  digital  circuit  for  minimum  transient  energy  consump#on  by  elimina#ng  hazards  

Total transitions = 6 Essential transitions = 2

Glitch transitions = 4

64  

Differen#al  Path  Delay  

Delay D < DPD

A B

C

A

B

C

D D Hazard or glitch

DPD

DPD: Differential path delay

time

Page 33: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 33

65  

Balanced  Path  Delays  

Delay D < DPD

A B

C

A

B

C

D No glitch

DPD

Delay buffer

time

66  

Glitch  Filtering  by  Iner#a  

Delay D > DPD

A B

C

A

B

C

D > DPD

Filtered glitch

DPD

time

Page 34: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 34

67  

Designing  a  Glitch-­‐Free  Circuit  

•  Maintain  specified  cri#cal  path  delay  •  Glitch  suppressed  at  all  gates  by  

– Path  delay  balancing  – Glitch  filtering  by  increasing  iner#al  delay  of  gates  or  by  inser#ng  delay  buffers  when  necessary  

Delay D

Path delay = d1

Path delay = d2

Minimum transient energy condition: |d1 – d2| < D

68  

Designing  a  Glitch-­‐Free  Circuit  

•  Logical  path  delay  balancing  – Logic  synthesis  (e.g.  Power  Compiler)  – Example  

•  S  =  a.b.c.d  with  p(a)  =  0.3;  p(b)  =  0.4;  p(c)  =  0.7;  p(d)  =  0.5  •  AND:  Pout  =  PA.PB      

AND

AND

AND

AND

AND

AND

a b

c d

a b

c d

0.12 0.084

0.042

0.12

0.35

0.042

Less Activity Less Glitches

Page 35: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 35

69  

Resource  Sharing  

•  Resource  sharing    – reduces  area  but  increases  ac#vity  

•  destructs  data  correla#on    

•  Bus  Mul#plexing  – Nbt:  Number  of  bus  transi#ons  per  cycle         1110  1111  

0000  0001  1110  0000  1111  0001  

Counter  1  

Counter  2  

Bus 1

Bus 2

Counter  1  

Counter  2  

Bus MUX  

Nbt = 2(1+1/2+1/4+...) = 4 Nbt >= 4 (depends on counter skew)

70  

Resource  Sharing  

Page 36: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 36

71  

Activity Reduction

•  Bus encoding to reduce activity – e.g. bus between cache memory and processor – objective: reduce number of transitions

•  e.g. activity of binary > activity of gray

Encoding  Logic  

Decoding  Logic  

Input  Ac#vity   Bus  Ac#vity   Output  Ac#vity  >   <  

72  

Bus  Encoding  •  Bus-­‐Invert  Coding  

–  Take  advantage  of  correla#on  between  successive  bus  values  –  Choose  sending  true  or  complement  form  of  bus  values  to  minimize  toggles  (based  on  Hamming  distance)  

•  Can  break  bus  into  fields  and  apply  bus-­‐invert  coding  to  each  field  

XOR

Page 37: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 37

73  

Bus  Encoding  

•  Gray  encoding  

•  T0  (binary)  code  –  Take  advantage  of  address  sequences  –  Add  a  redundant  line  to  the  bus  (INC)  

•  INC  =  1  if  B(t)==B(t-­‐1)+1  (or  +K);  Bus  is  kept  constant  and  receiver  increases  value  by  1  (or  K)  

•  Otherwise  INC  =  0  and  B(t)  is  normally  transferred  

58 Etat de l’art sur les techniques d’optimisation des performances

xor

xor

xor

xor

xor

xor

B1

B2

B3

B4

G1

G2

G3

G4

B1

B2

B3

B4

Codeur Décodeur

Fig. 3.10 – Architecture des codeur et décodeur pour le code de Gray. Exemple d’un bus de 4bits.

– Bn représente la valeur du bit n non codé ;

– Bn+1 représente la valeur du bit n + 1 non codé.

Au niveau du décodeur l’équation qu’il faut appliquer pour retrouver le codage en binaire pur est

la suivante :

Bn = Gn ! Bn+1 (3.2)

A partir de ces formules, il est aisé de construire l’architecture du codeur et du décodeur tel que

le montre la figure 3.10.

Les expérimentations e!ectuées dans [SD95] montrent une réduction de l’activité de 33% et une

réduction de l’énergie consommée sur le bus de 77%.

Par contre pour des bus larges (quand n est grand), le décodeur possède un long chemin critique

puisque les portes ou-exclusives sont cascadées des MSB vers les LSB.

Code T0

Dans [BMM+97], l’idée proposée est de rajouter un fil noté INC que l’on positionne à un ni-

veau logique défini lorsque les adresses accédées sont consécutives. Pour cela, la valeur de l’adresse

au cycle d’horloge t " 1 est stockée, une incrémentation de 1 est e!ectuée puis cette valeur est

comparée à celle arrivant au cycle d’horloge t. Si c’est deux valeurs sont identiques alors l’état du

bus ne change pas et le fil supplémentaire INC est positionné à un niveau logique défini. Dans le

cas contraire, la valeur de l’adresse est envoyée sur le bus. Au niveau du décodeur une sélection est

e!ectuée en fonction de l’état du fil INC entre la valeur sur le bus ou la valeur passée incrémentée

de 1.

Cette technique réduit l’activité à 0 lorsque les adresses accédées sont consécutives, ce qui permet

de réduire fortement la consommation sur le bus.

Une évolution de cette technique est proposée dans [FPSS00] où il est possible de définir plusieurs

pas d’incrémentation (+step) pour des accès consécutifs.

tel-0

0445

791,

ver

sion

1 -

11 J

an 2

010

0

1

+K  =

Encoder

0

1

+K  

Decoder

INC

B(t)

74  

Memory  Op#miza#on  

•  Place  data  which  are  accessed  frequently  in  internal  memory  or  in  registers  

•  Minimize  memory  size  (for  leakage)  by  maximizing  data  reuse  

CPU

Cache Mem ext

Scratchpad

Available Memory Space

Registers

Page 38: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 38

75  

Split Memory Access

dout

addr[0]

32

32

addr[14:1]

addr[14:0]

clock

pre_addr q d 15

write

dout

RAM 16K x 32

noe

din

addr

addr

din

dout

16K x 32 RAM

noe write

76  

Power  es#ma#on  and  reduc#on  

1.  Why  care  about  power?  2.  Where  does  power  go  in  CMOS  chips?  3.  How  to  es#mate  power?  4.  How  to  reduce  power?    

– Design  flow  and  principles  – Architecture-­‐level  op#misa#on  – SoYware  es#ma#on  and  op#miza#on  – System-­‐level  op#misa#on  

5.  Conclusions  

Page 39: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 39

77  

Reducing Energy of Software

•  Embedded software determines power/energy consumed by the processor – So why not modifying software to reduce energy?

•  Energy, power or performance? – Energy = battery life-time – Power = supply voltage distribution sizing, heat

dissipation

MOV DX,[BX] MOV AX,CX MOV AX,DX Power: 1.15 W Energy: 8.6 10-8 J

NOP MOV AX,CX MOV DX,[BX] NOP NOP NOP NOP MOV AX,DX

NOP Power: O.99 W 14% less Energy: 22.3 10-8 J 158% more

78  

Reducing Energy of Software  

•  Performance  =  use  of  memory  bandwidth  •  Energy  =  use  of  registers  or  scratchpad  memory,  reduce  ac#vity  

LDR r3, [r2, #0] ADD r3,r0,r3 MOV r0,#28 LDR r0,[r2,r0] ADD r0,r3,r0 ADD r2,r2,#4 ADD r1,r1,#1 CMP r1,#100 BLT LL3

ADD r3,r0,r2 MOV r0,#28 MOV r2,r12 MOV r12,r11 MOV r11,r10 MOV r0,r9 MOV r9,r8 MOV r8,r1 LDR r1,[r4,r0] ADD r0,r3,r1 ADD r4,r4,#4 ADD r5,r5,#1 CMP r5,#100 BLT LL3

int a[1000]; c=a; for (i=1; i<100; i++) { b += *c; b += *(c+7); c+=1; }

2096 cycles 19.92 uJ

2231 cycles 16.47 uJ

Page 40: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 40

79  

Reducing Energy of Software  

•  Use  of  internal  registers  (20%)  •  Use  of  internal  memory  (scrathpad  is  beLer  than  cache)  (40%)  

•  Transforma#ons  to  reduce  the  number  of  read/write  to  memory:  be  aware  of  the  code  you  write  

– Loop  permuta#on,  unrolling,  #ling,  fusion,  fission,  ...  – Gain  from  40%  to  x5  

•  Compiler:  instruc#on  selec#on,  scheduling,  ...  

FOR i:= 1 TO N DO B[i] := f(A[i]) ;

FOR i:= 1 TO N DO C[i] := g(B[i]) ;

FOR i:= 1 TO N DO B[i] := f(A[i]) ; C[i] := g(B[i]) ;

END ;

[Marwedel02]

80  

SoYware  Power  Es#ma#on  

•  Power  es#ma#on:  processor  instruc#on  power  models  

•  Methods  based  on  simula#on  – Program  is  simulated  on  a                                          low-­‐level  (RTL)  model                          of  the  processor  

•  Physical  measurements  – Measure  current  of                  instruc#on  sequences  

Power System

CPU A

Page 41: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 41

81  

SoYware  Power  Es#ma#on  

•  Instruc#on-­‐level  model  

•  Measure  on  instruc#on  sequences:  Basei  –  Instruc#ons  in  a  loop  or  sequence  of  instruc#ons  

Instruction Courant mA Cycles Energie nJNOP 198 1 3.26LD 213 1 3.51ST 346 2 11.40ADD 199 1 3.28MULT 198 1 3.26

SPARClite

∑∑∑ ++k kji ijii ii EnergyNOverheadNBase

, , ).().(

82  

SoYware  Power  Es#ma#on  

•  Inter-­‐instruc#on  effect:  Overheadi  – Previous  state  of  processor  influences  energy  of  next  instruc#on  

– e.g.  486DX2  •  XOR  BX,1  •  ADD  RX,DX  //  overhead:  6.8  mA  

•  Pipeline,  cache  miss  :  Energyk  

∑∑∑ ++k kji ijii ii EnergyNOverheadNBase

, , ).().(

Page 42: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 42

83  

Example:  TMS  320C54x  

•  Applica#on  note  from  TI  –  hLp://www.#.com/sc/docs/apps/dsp/tms320c54x.html  

•  Measure  of  current  while  execu#ng  code  –  (a)  Instruc#ons  are  repeated  –  (b)  Instruc#ons  in  loops  

testloop          testloop      I1            RPT  #255      I1            I1        (256  #mes)        B  testloop      I1      I1      B  testloop  

(a)  straight-­‐line  method            (b)  RPT  method                    

84  

Example:  TMS  320C54x  Instructions/Applications CURRENT

(mA per MIPS)

CURRENT AT 50 MIPS (mA)

POWER AT 50 MIPS, 3V (mW)

IDLE3 0 0 0 IDLE2 0.03 1.5 4.5 IDLE1 0.12 6 18 Repeat NOPs 0.3 15 45 Inline NOPs 0.4 20 60 Block data transfer in on-chip DARAM using RPT

0.8 40 120

Repeat MAC with changing data (dual-operand addressing)

1.0 50 150

Inline MAC with changing data (dual-operand addressing)

1.2 60 180

Repeat MACD with changing data (single-operand addressing)

0.8 40 120

Inline MACD with changing data (single-operand addressing)

1.0 50 150

Repeated double-precision arithmetic instructions with changing data

0.9 45 135

Inline double-precision arithmetic instructions with changing data

1.1 55 165

Repeat FIRS with changing data 1.2 60 180 Inline FIRS with changing data 0.9 45 135 FIR filter 0.9 45 135 Full-rate GSM vocoder 1.03 51.5 154.5 Complex 256-point FFT 1.07 53.5 160.5

Page 43: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 43

85  

Power  es#ma#on  and  reduc#on  

1.  Why  care  about  power?  2.  Where  does  power  go  in  CMOS  chips?  3.  How  to  es#mate  power?  4.  How  to  reduce  power?    

– Design  flow  and  principles  – Architecture-­‐level  op#misa#on  – SoYware  es#ma#on  and  op#miza#on  – System-­‐level  op#misa#on  

5.  Conclusions  

86  

StrongArm  

•  Intel  StrongArm  SA-­‐1110  (ARM  V4)  

Compaq/Digital  StrongARM  Intel  StrongARM  

Page 44: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 44

87  

Power  Management  (PM)  

•  Play  with  power  mode  of  processors  (systems)  – Reduce  supply  voltage  Vdd  – Sleep  or  Idle  modes  

•  Switch  off  PLLs,  clock  drivers,  peripherals  

•  Example:  StrongArm  SA1100  

RUN  

IDLE   SLEEP  

400  mW  

50  mW   160  µW  

10  µs   10  µs  90  µs  

160  ms  

90  µs  

88  

Sta#c  Power  Management  (SPM)  

•  Different  opera#on  modes  

[IBM]

Full-­‐On  

Normal-­‐On  

Standby  

Suspend  

Hiberna#on  

Off  

Ac#vity  Monitor  

Page 45: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 45

89  

Dynamic  Power  Management  (DPM)  

•  Reduce  speed  (clock  freq.)  and  Vdd  depending  on  processor  ac#vity  (and  therefore  input  data)  – e.g.  MPEG4  coder  

After

Before

Time

Proc

esso

r Spe

ed

IDLE

E=CVH2+Eidle

E=CVL2

90  

Dynamic  Power  Management  (DPM)

•  Smart DPM of Vdd and Fclock

Page 46: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 46

91  

DPM  Example:  TransMeta  Crusoe  

•  Crusoe  processor:  x86  clone  running  at  700  MHz  max  

•  Processor  ac#vity  detec#on  by  HW  monitors  

•  OS  adjusts  Fclock  and  Vdd  

Fclock  MHz   Vdd   Power  

700   1.65  V   100%  

400   1.4  V   41%  

333   1.2  V   25%  

92  

0

10

20

30

40

50

60

70

80

90

100

300 400 500 600 700 800 900 1000

Frequency (MHz)

% o

f max

pow

erl c

onsu

mpt

ion

300 Mhz0.80 V

433 Mhz0.87 V

533 Mhz0.95 V

667 Mhz1.05 V

800 Mhz1.15 V

900 Mhz1.25 V

1000 Mhz1.30 V

Typical operating region Peak performance region

DPM  Example:  TransMeta  Crusoe  

Source: Transmeta

Page 47: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 47

93  

DPM  Example:  TransMeta  Crusoe  

Source: Transmeta

94  

DPM  Example:  TransMeta  Crusoe  

Page 48: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 48

95  

Conclusions: Is  it  s#ll  necessary  to  convince  you?  

•  New metrics for IC design –  Power and/or energy becomes a major

constraint

Flexibility

Power Energy

Cost

Performance

96  

Conclusions  

•  Power  consump#on  needs  to  be  es#mated  and  op#mized  at  each  abstrac#on  level  

•  Reduce  supply  voltage  (Vdd)  while  keeping  performance  acceptable    

•  Reduce  ac#vity  of  internal  and  external  signals  

A smart design will always consume less power So design with your brain on!

Page 49: LowPowerVLSICircuitsandSystems - Inriapeople.rennes.inria.fr/Olivier.Sentieys/teach/Low-Power_handouts.pdf · SPICE et al. Epic/PowerMill Avant!/ADM Mentor Graphics/Lsim Power Analyst

Es#ma#on  and  Reduc#on  of  Power  Consump#on   4/21/14

Olivier Sentieys 49

97  

Perspec#ves  

•  Technologies  – Mul#-­‐Vth  – SOI,  SiGe,  ...  

•  Architecture-­‐level  – Parallelism,  pipeline,  parallelism,  pipeline,  …  – Reduce  ac#vity  – Memory  hierarchy    

•  System-­‐level  – Dynamic  management  of  Vdd/Vth/Fclk  – Efficient  SW  compila#on