-
inst.eecs.berkeley.edu/~ee241b
Borivoje Nikolić
EE241B : Advanced Digital Circuits
Lecture 18 – Power-Performance Tradeoffs 2
1EECS241B L18 POWER-PERFORMANCE II
MarketWatch, March 28: Opinion: There’s no returning to regular
schooling as online learning goes mainstream, by Alex Hicks
When in-person education resumes, online learning tools and methods will be entrenched in the system
-
Announcements
• Project midterm reports due today, March 31• Please e-mail me the link to your web page
• Assignment 3 due Thursday, April 2.• Quiz next Tuesday
• Reading – req’d• Rabaey et al, LPDE, Ch. 4
2EECS241B L18 POWER-PERFORMANCE II
-
Outline
• Module 5• Power-performance tradeoffs
3EECS241B L18 POWER-PERFORMANCE II
-
5.B Power-Performance Tradeoffs
4EECS241B L18 POWER-PERFORMANCE II
-
Know Your Enemy
• Where does power go in CMOS?• Switching (dynamic) power
• Charging capacitors• Leakage power
• Transistors are imperfect switches• Short-circuit power
• Both pull-up and pull-down on during transition• Static currents
• Biasing currents
EECS241B L18 POWER-PERFORMANCE II 5
-
Summary of Power Dissipation Sources
• – switching activity• CL – load capacitance• CCS – short-circuit “capacitance”• Vswing – voltage swing• f – frequency
DDLeakDCDDswingCSL VIIfVVCCP ~
IDC – static current Ileak – leakage current
powerstaticrateoperationenergyP
EECS241B L18 POWER-PERFORMANCE II 6
-
CMOS Performance Optimization• Reminder - sizing: Optimal performance with equal fanout per stage
• Extendable to general logic cone through ‘logical effort’• Equal effective fanouts (giCi+1/Ci) per stage• Optimal fanout is around 4
CL
CL
predecoder
3 15
CW
word driver
addrinput
wordline
[Ref: I. Sutherland, Morgan-Kaufman‘98]EECS241B L18 POWER-PERFORMANCE II 7
-
Performance Optimization
Energy
Increasing performanceincreases power!
Delay =1/Performance
EECS241B L18 POWER-PERFORMANCE II 8
-
Performance OptimizationEnergy
Delay = 1/Performance
Mircoarchitecture A
Mircoarchitecture B
EECS241B L18 POWER-PERFORMANCE II 9
-
The Design Abstraction Stack
Logic/RT
(Micro-)Architecture
Software
Circuit
Device
System/Application
A very rich set of design parameters to consider!It helps to consider options in relation to their abstraction layer
sizing, supplies, thresholds
logic family, standard cell versus custom, clocking
Parallel versus pipelined, general purpose versus application specific
Bulk, PDSOI, FDSOI, finFET
Choice of algorithm
Amount of concurrency
EECS241B L18 POWER-PERFORMANCE II 10
-
Achieve the highest performance under the power cap
Delay
Unoptimized design
DmaxDmin
Energy/op
Emin
Emax
Power-Performance Optimization
EECS241B L18 POWER-PERFORMANCE II 11
-
Achieve the highest performance under the power cap
Delay
Unoptimized design
Var1
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Power-Performance Optimization
EECS241B L18 POWER-PERFORMANCE II 12
-
Achieve the highest performance under the power cap
Delay
Unoptimized design
Var1
Var2
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Power-Performance Optimization
EECS241B L18 POWER-PERFORMANCE II 13
-
How far away are we from the optimal solution?
Delay
Unoptimized design
Var1
Var2
Var1 + Var2
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Power-Performance Optimization
EECS241B L18 POWER-PERFORMANCE II 14
-
Global optimum – best performance
Delay
Unoptimized design
Var1
Var2
Var1 + Var2
Global
Energy/op
Designoptimizationcurves
Emax
DmaxDmin
Emin
Power-Performance Optimization
EECS241B L18 POWER-PERFORMANCE II 15
-
Maximize throughput for given energy or
Minimize energy for given throughput
Delay
Unoptimized design
Emax
DmaxDmin
Energy/op
Emin
Power-Performance Optimization
EECS241B L18 POWER-PERFORMANCE II 16
-
topology A
topology BDelay
Ener
gy/o
p
Power-Performance Optimization
• There are many sets of parameters to adjust• Tuning variables• Circuit
(sizing, supply, threshold)
• Logic style(std. cells, custom , …)
• Block topology (adder: CLA, CSA, …)
• Micro-architecture (parallel, pipelined)
EECS241B L18 POWER-PERFORMANCE II 17
-
Power-Performance Optimization
• There are many sets of parameters to adjust• Tuning variables• Circuit
(sizing, supply, threshold)
• Logic style(std. cells, custom , …)
• Block topology (adder: CLA, CSA, …)
• Micro-architecture (parallel, pipelined)
Globally optimal power-performance curve for a given function
EECS241B L18 POWER-PERFORMANCE II 18
topology A
topology BDelay
Ener
gy/o
p
-
f (Anom,B)
Delay
Ener
gy
D*
SA
SB
f (A0,B)
f (A,B0)
(A0,B0)
D0
0AAA AD
AES
Energy-Delay Sensitivity
EECS241B L18 POWER-PERFORMANCE II 19
-
Delay
Ener
gy
f (A0,B)
f (A,B0)
(A0,B0)
f (A1,B)∆D
∆E = SA∙(∆D) + SB∙∆D
D0At the solution point all sensitivities should be equal
Solution: Equal Sensitivities
EECS241B L18 POWER-PERFORMANCE II 20
-
5. C Architectural Optimization
21EECS241B L18 POWER-PERFORMANCE II
-
Optimal Processors
• Processors used to be optimized for performance• Optimal logic depth was found to be 8-11 FO4 delays in superscalar processors• 1.8-3 FO4 in sequentials, rest in combinatorial
• Kunkel, Smith, ISCA’86• Hriskesh, Jouppi, Farkas, Burger, Keckler, Shivakumar, ISCA’02• Harstein, Puzak, ISCA’02• Sprangle, Carmean, ISCA’02
• But those designs are have very high power dissipation• Need to optimize for both performance and power/energy
EECS241B L18 POWER-PERFORMANCE II 22
-
From System View: What is the Optimum?
• How do sensitivities relate to more traditional metrics:• Power per operation (MIPS/W, GOPS/W, TOPS/W)• Energy per operation (Joules per op)• Energy-delay product
• Can be reformatted as a goal of optimizing power x delayn• n = 0 – minimize power per operation• n = 1 – minimize energy per operation• n = 2 – minimize energy-delay product• n = 3 – minimize energy-(delay)2 product
EECS241B L18 POWER-PERFORMANCE II 23
-
Optimization Problem
• Set up optimization problem:• Maximize performance under energy constraints• Minimize energy under performance constraints
• Or minimize a composite function of EnDm• What are the right n and m?
• n = 1, m = 1 is EDP – improves at lower VDD• n = 1, m = 2 is invariant to VDD
• E ~ CVDD2• D ~ 1/VDD
EECS241B L18 POWER-PERFORMANCE II 24
-
Hardware Intesnity
• Introduced by Zyuban and Strenski in 2002.• Measures where is the design on the Energy-Delay curve• Parameter in cost function
optimization
EECS241B L18 POWER-PERFORMANCE II 25
Slope of the optimal E-D curve at the chosen design point
-
Optimum Across Hierarchy Layers
Zyuban et al, TComp’04EECS241B L18 POWER-PERFORMANCE II 26
Optimal logic depth in pipelined processors is ~18FO4Relatively flat in the 16-22FO4 range
-
5.D Circuit-Level Tradeoffs
27EECS241B L18 POWER-PERFORMANCE II
-
iin
iL
ThDD
DDdpi C
CVV
VKt,
,1
iin
iL
ThDD
DDdpi W
WVV
VKtD,
,1
CL
In Out
1 2 N
Alpha-Power Based Delay Model
EECS241B L18 POWER-PERFORMANCE II 28
-
♦ Leakage
♦ Switching 20 1 , int,Sw L i i DDE C C V
DVeIWE DDnV
VV
InLkt
DDTh
0
Energy Models
EECS241B L18 POWER-PERFORMANCE II 29
-
Sizing, Supply, Threshold Optimization
• Transistor sizing can yield large power savings with small delay penalties• Gate sizing• Beta-ratio adjustments• (Stack resizing)
• Supply voltage affects both active and leakage energy• Threshold voltage affects primarily the leakage
EECS241B L18 POWER-PERFORMANCE II 30
-
CL
In Out1 2 N
Unconstrained energy: find min D = tpi
Constrained energy: find min D, under E < EmaxWhere E = ei
11 jginjginjgin CCC ,,, 11 jjj WWW
Apply to Sizing of an Inverter Chain
EECS241B L18 POWER-PERFORMANCE II 31
tp1 tp2 tpN
-
Constrained Optimization
• Find min(D) subject to E = Emax• Constrained function minimization
• E.g. Lagrange multipliers
• Can solve analytically for x = Wj, VDD, VThEECS241B L18 POWER-PERFORMANCE II 32
maxExExDx
0x
maxDDxEx
Or dual:
-
Inverter Chain: Sizing Optimization
EECS241B L18 POWER-PERFORMANCE II 33
-
• Variable taper achieves minimum energy• Reduce number of stages at large dinc
[Ma, Franzon, IEEE JSSC, 9/94]
1 2 3 4 5 6 70
5
10
15
20
25
stage
effe
ctiv
e fa
nout
, f
0%
1%
10%
30%
dinc= 50%nomopt
1 1
11j j
jj
W WW
W
Wnom
DD
SKV
22
1
jW
j j
eS
f f
Stojanovic, ICCAD’02
Inverter Chain: Sizing Optimization
EECS241B L18 POWER-PERFORMANCE II 34
ej – energy per stagefj – fanout per stage
-
• Gate sizing (Wi)
• Supply voltage (Vdd)
xv = (VTh+VTh)/Vdd
Vth Vdd
Vddnom
Sens(Vdd)
0
for equal feff (Dmin) 1
swj j
nom j jj
EW e
D f fW
~-2E/Dv
vsw
DD
DD
sw
xx
DE
VD
VE
1
12
Sensitivity to Sizing and Supply
EECS241B L18 POWER-PERFORMANCE II 35
-
• Threshold voltage (Vth)
Low initial leakage
speedup comes for “free”0 Vth
Vthnom
Sens(Vth)
1
t
ThThDDLk
th
Th
nVVVVP
VD
VE
Sensitivity to Vth
EECS241B L18 POWER-PERFORMANCE II 36
-
Next Lecture
• Low-power design• Lowering supply voltage
EECS241B L18 POWER-PERFORMANCE II 37