cmpen 411 vlsi digital circuitsvlsi digital circuits...
TRANSCRIPT
CMPEN 411VLSI Digital CircuitsVLSI Digital Circuits
Spring 2009
Lecture 14: Designing for Low Power
[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]
Sp09 CMPEN 411 L14 S.1
RemindersNext lectureNext lecture
Dynamic logic- Reading assignment – Rabaey, et al, 6.3
Sp09 CMPEN 411 L14 S.2
Review: CMOS Power Equations
P = CL VDD2 f + tscVDD Ipeak f + VDD Ileak
Dynamic power
Short-circuit power
Leakage power
Sp09 CMPEN 411 L14 S.3
Power and Energy Design Space
Constant Throughput/Latency
Variable Throughput/Latency
Energy Design Time Non-active Modules Run Time
Logic designDFS DVS
Active
(Dynamic)
Reduced Vdd
TSizingClock Gating
DFS, DVS
(Dynamic Freq, Voltage Scaling)
Multi-VddScaling)
LeakageMulti-VT
Sleep Transistors
Multi VLeakage
(Standby)Stack effect
Pin ordering
Multi-Vdd
Variable VT
Input control
Variable VT
Sp09 CMPEN 411 L14 S.4
Input control
Transistor Sizing for Minimum EnergyDevice sizing COMBINED with supply voltage reduction is a veryeffective way to reduce the energy consumption of a logic networkof a logic network
Device sizing affects dynamic energy consumptiong y gygain is largest for networks with large overall effective fan-outs (F = CL/Cg,1)
Sp09 CMPEN 411 L14 S.5
Dynamic Power as a Function of Device SizeDevice sizing affects dynamic energy consumption
gain is largest for networks with large overall effective fan-outs (F = CL/Cg,1)L g,1)
The optimal gate sizing factor (f) for dynamic energy is smaller than the one for
1.5
F=1smaller than the one for performance, especially for large F’s
f F 20
1
ed e
nerg
y
F 1
F=2
F=5e.g., for F=20, fopt(energy) = 3.53 while fopt(performance) = 4.47 0.5
norm
aliz
e F=5
F=10
If energy is a concern avoid oversizing beyond the optimal 1 2 3 4 5 6 7
0
f
F=20
Sp09 CMPEN 411 L14 S.6
f
From Nikolic, UCB
Dynamic Power Consumption is Data DependentSwitching activity P has two componentsSwitching activity, P0→1, has two components
A static component – function of the logic topologyA dynamic component – function of the timing behavior (glitching)
Static transition probabilityP = P x P
A B Out
2-input NOR GateP0→1 = Pout=0 x Pout=1
= P0 x (1-P0)
0 0 1
0 1 0With input signal probabilities
PA=1 = 1/21 0 0
1 1 0PB=1 = 1/2
NOR static transition probability
Sp09 CMPEN 411 L14 S.7
= 3/4 x 1/4 = 3/16
NOR Gate Transition ProbabilitiesSwitching activity is a strong function of the input signalSwitching activity is a strong function of the input signal statistics
PA and PB are the probabilities that inputs A and B are one
A
B
0
CLBA
PA
P
0
1 0 1
P0→1 = P0 x P1 = (1-(1-PA)(1-PB)) (1-PA)(1-PB)
PB
Sp09 CMPEN 411 L14 S.8
Transition Probabilities for Some Basic Gates
P0→1 = Pout=0 x Pout=1
NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB)OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB))
NAND PAPB x (1 - PAPB)AND (1 P P ) x P PAND (1 - PAPB) x PAPB
XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB)
X
B
AZ
X0.5
0.5
F Z P
For X: P0→1 =
Sp09 CMPEN 411 L14 S.9
For Z: P0→1 =
Transition Probabilities for Some Basic Gates
P0→1 = Pout=0 x Pout=1
NOR (1 - (1 - PA)(1 - PB)) x (1 - PA)(1 - PB)OR (1 - PA)(1 - PB) x (1 - (1 - PA)(1 - PB))
NAND PAPB x (1 - PAPB)AND (1 P P ) x P PAND (1 - PAPB) x PAPB
XOR (1 - (PA + PB- 2PAPB)) x (PA + PB- 2PAPB)
X
B
AZ
X0.5
0.5
F Z P P P (1 P P ) P P
For X: P0→1 = P0 x P1 = (1-PA) PA
= 0.5 x 0.5 = 0.25
Sp09 CMPEN 411 L14 S.10
For Z: P0→1 = P0 x P1 = (1-PXPB) PXPB
= (1 – (0.5 x 0.5)) x (0.5 x 0.5) = 3/16
Another Example
AX
0.5(1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16
BZ
X0.5
(1- 3/16 x 0.5) x (3/16 x 0.5) = 0.085
Sp09 CMPEN 411 L14 S.11
Inter-signal CorrelationsDetermining switching activity is complicated by the factDetermining switching activity is complicated by the fact that signals exhibit correlation in space and time
reconvergent fan-out
AX
0.5(1-0.5)(1-0.5)x(1-(1-0.5)(1-0.5)) = 3/16
BZ
X0.5
(1- 3/16 x 0.5) x (3/16 x 0.5) = 0.085Reconvergent
P(Z=1) = P(B=1) & P(A=1 | B=1)Have to use conditional probabilities
Sp09 CMPEN 411 L14 S.12
notice that Z = (A or B) and B = AB or B = B, so 0 -> 1 should be (and is) 1/2 x 1/2 = 1/4 !!!
Logic RestructuringLogic restructuring: changing the topology of a logicLogic restructuring: changing the topology of a logic network to reduce transitions
AND: P0→1 = P0 x P1 = (1 - PAPB) x PAPB
AABW
X
Y0.5 (1-0.25)*0.25 = 3/16 0.5
0.57/64
3/16
15/256B
CD F C
D Z
FX0.5
0.50.5
0.5
0.5
15/256
3/16
Chain implementation has a lower overall switching activity
3/16
Chain implementation has a lower overall switching activity than the tree implementation for random inputs
Sp09 CMPEN 411 L14 S.13
Input Ordering
0 5 0 2AB
C
X
F
0.5
0.2
BC
A
X
F
0.2
0.10 5C0.2
0.1 0.5
Which is better wrt transition probabilities?
Sp09 CMPEN 411 L14 S.14
Input Ordering
0 5 0 2(1-0.5x0.2)x(0.5x0.2)=0.09 (1-0.2x0.1)x(0.2x0.1)=0.0196
AB
C
X
F
0.5
0.2
BC
A
X
F
0.2
0.10 5C0.2
0.1 0.5
Beneficial to postpone the introduction of signals with a
Which is better wrt transition probabilities?
Beneficial to postpone the introduction of signals with a high transition rate (signals with signal probability close to 0.5)
Sp09 CMPEN 411 L14 S.15
Glitching in Static CMOS NetworksGates have a nonzero propagation delay resulting inGates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards)
glitch: node exhibits multiple transitions in a single cycle before ttli t th t l i l
AB
X
settling to the correct logic value
BZC
ABC
X
101 000
X
Z
Sp09 CMPEN 411 L14 S.16
Unit Delay
Glitching in Static CMOS NetworksGates have a nonzero propagation delay resulting inGates have a nonzero propagation delay resulting in spurious transitions or glitches (dynamic hazards)
glitch: node exhibits multiple transitions in a single cycle before ttli t th t l i l
AB
X
settling to the correct logic value
BZC
ABC
X
101 000
X
Z
Sp09 CMPEN 411 L14 S.17
Unit Delay
Glitching in an RCA
Cin
S0S1S2S14S153
2
3
e (V
)
S32
put V
olta
ge
CinS2
S3
S4
S5
S15
1
S O
utp
S0
S1
S5S10
Sp09 CMPEN 411 L14 S.18
00 2 4 6 8 10 12
Time (ps)
Balanced Delay Paths to Reduce GlitchingGlit hi i d t i t h i th th l th iGlitching is due to a mismatch in the path lengths in the logic network; if all input signals of a gate change simultaneously, no glitching occurs
F00
0
y g g
F1
F2
F
0
0
0
12
F1
F3
0
0
1
F30F2
00 1
So equalize the lengths of timing paths through logic
Sp09 CMPEN 411 L14 S.19
Power and Energy Design Space
Constant Throughput/Latency
Variable Throughput/Latency
Energy Design Time Non-active Modules Run Time
Logic designDFS DVS
Active
(Dynamic)
Reduced Vdd
TSizingClock Gating
DFS, DVS
(Dynamic Freq, Voltage Scaling)
Multi-VddScaling)
LeakageMulti-VT
Sleep Transistors
Multi VLeakage
(Standby)Stack effect
Pin ordering
Multi-Vdd
Variable VT
Input control
Variable VT
Sp09 CMPEN 411 L14 S.20
Input control
Dynamic Power as a Function of VDD
Decreasing the VDD decreases dynamic energy consumption 4.5
55.5
(quadratically)
But, increases gate delay (decreases
33.5
4
delay (decreases performance)
11.5
22.5
10.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
VDD (V) D t i th iti l th( ) t d i ti d hi hDetermine the critical path(s) at design time and use high VDD for the transistors on those paths for speed. Use a lower VDD on the other gates, especially those that drive l it ( thi i ld th l t
Sp09 CMPEN 411 L14 S.21
large capacitances (as this yields the largest energy benefits).
Multiple VDD Considerations?How many VDD? – Two is becoming common
Many chips already have two supplies (one for core and one for I/O)
Wh bi i lti l li l l tWhen combining multiple supplies, level converters are required whenever a module at the lower supply drives a gate at the higher supply (step-up)
If a gate supplied with VDDL drives a gate at VDDH, the PMOS never turns off
- The cross-coupled PMOS transistorsd th l l i
VDDH
do the level conversion- The NMOS transistor operate on a
reduced supplyLevel converters are not needed
Vin
VoutVDDL
Level converters are not needed for a step-down change in voltageOverhead of level converters can be mitigated by doing conversions at register boundaries and embedding the level conversion inside
Sp09 CMPEN 411 L14 S.22
at register boundaries and embedding the level conversion inside the flipflop (see Figure 11.47)
Dual-Supply Inside a Logic BlockMinimum energy consumption is achieved if all logicMinimum energy consumption is achieved if all logic paths are critical (have the same delay)
Clustered voltage-scalingClustered voltage scalingEach path starts with VDDH and switches to VDDL (gray logic gates) when delay slack is availableLevel conversion is done in the flipflops at the end of the pathsLevel conversion is done in the flipflops at the end of the paths
Sp09 CMPEN 411 L14 S.23
Dual-Supply Inside a Logic BlockMinimum energy consumption is achieved if all logicMinimum energy consumption is achieved if all logic paths are critical (have the same delay)
Clustered voltage-scalingClustered voltage scalingEach path starts with VDDH and switches to VDDL (gray logic gates) when delay slack is availableLevel conversion is done in the flipflops at the end of the pathsLevel conversion is done in the flipflops at the end of the paths
Sp09 CMPEN 411 L14 S.24
Power and Energy Design Space
Constant Throughput/Latency
Variable Throughput/Latency
Energy Design Time Non-active Modules Run Time
Logic designDFS DVS
Active
(Dynamic)
Reduced Vdd
TSizingClock Gating
DFS, DVS
(Dynamic Freq, Voltage Scaling)
Multi-VddScaling)
LeakageMulti-VT
Sleep Transistors
Multi VLeakage
(Standby)Stack effect
Pin ordering
Multi-Vdd
Variable VT
Input control
Variable VT
Sp09 CMPEN 411 L14 S.25
Input control
Stack EffectSubthreshold leakage is a function of the circuit topologySubthreshold leakage is a function of the circuit topology and the value of the inputs
VT = VT0 + γ(√|-2φF + VSB| - √|-2φF|)where VT0 is the threshold voltage at VSB = 0; VSB is the source-
bulk (substrate) voltage; γ is the body-effect coefficient
A B Leakage is least when A = B = 0
A
Out
V
g
Leakage reduction due to stacked transistors is called the stack effect
BVX
Sp09 CMPEN 411 L14 S.26
Short Channel Factors and Stack EffectIn short channel devices the subthreshold leakageIn short-channel devices, the subthreshold leakage current depends on VGS,VBS and VDS. The VT of a short-channel device decreases with increasing VDSdue to DIBL (drain induced barrier loading)due to DIBL (drain-induced barrier loading).
Typical values for DIBL are 20 to 150mV change in VT per voltage change in VDS so the stack effect is even more significant for short channel devicessignificant for short-channel devices.VX reduces the drain-source voltage of the top nfet, increasing its VT and lowering its leakage even more
For our 0.25 micron technology, VX settles to ~100mV in gy, Xsteady state so VBS = -100mV and VDS = VDD -100mV which is 20 times smaller than the leakage of a device with VBS = 0mV and VDS = VDD
Sp09 CMPEN 411 L14 S.27
BS DS DD
Leakage as a Function of Design Time VT
Reducing the VTincreases the sub-threshold leakage current (exponentially)
90mV reduction in VTincreases leakage by an
d f it d
ID (A
)
order of magnitude
But, reducing VTdecreases gate delay
VT=0.4VVT=0.1V
decreases gate delay (increases performance) 0 0.2 0.4 0.6 0.8 1
VGS (V)
D t i th iti l th( ) t d i ti d lDetermine the critical path(s) at design time and use low VT devices on the transistors on those paths for speed. Use a high VT on the other logic for leakage control.
Sp09 CMPEN 411 L14 S.28
A careful assignment of VT’s can reduce the leakage by as much as 80%
Dual-Thresholds Inside a Logic BlockfMinimum energy consumption is achieved if all logic
paths are critical (have the same delay)
Use lower threshold on timing critical pathsUse lower threshold on timing-critical pathsAssignment can be done on a per gate or transistor basis; no clustering of the logic is neededNo level converters are needed
Sp09 CMPEN 411 L14 S.29
IBM Cu11/Cu08 Blue Logic Library
ASIC Cu11 (130nm) Library : Dual-vt library2690 total cells in standard cell library2690 total cells in standard cell libraryNominal Vt level (~300mv)Low Vt level (~210mv)
Low-vt version has same physical footprint~15% improvement in gate delay~10x increase in leakage power
ASIC C 08 (90 ) Lib M l i libASIC Cu08 (90nm) Library : Multi-vt library2118 total cells in standard cell library
Intermediate-vt (AVT) and Low-vt (LVT) version of each cellTwo more vt levels being planned (very lowvt and high vt)Two more vt levels being planned (very lowvt and high vt)
Sp09 CMPEN 411 L14 S.30
An example to summarize all design-time techniquesq
Critical pathCritical path
Sp09 CMPEN 411 L14 S.31
Design Time Low Power Techniques
Lower VddLower Vdd
Higher Vdd
Sp09 CMPEN 411 L14 S.32Level Converter
Design Time Low Power Techniques
Higher VthHigher Vth
Lower Vth
Sp09 CMPEN 411 L14 S.33
Design Time Low Power Techniques
Stack Forcing1/2 W
W /Stack ForcingIn
Out
W
W 1/2 W
1/2 W
Sp09 CMPEN 411 L14 S.34
1/2 W
Low Power Techniques – Interaction w/ each other
Higher VthHigher Vth
Lower VthApply high Vth and size-up to recover speed
Sp09 CMPEN 411 L14 S.35
Next Lecture and RemindersNext lectureNext lecture
Dynamic logic- Reading assignment – Rabaey, et al, 6.3
Sp09 CMPEN 411 L14 S.36