traveling the wild frontiers of ultra-low voltage...
TRANSCRIPT
PATMOS, September 05, Leuven
Traveling the Wild Frontiers of Ultra-Low Voltage DesignTraveling the Wild Frontiers of Ultra-Low Voltage Design
Jan M. RabaeyDirector Gigascale Silicon Research Center
Co-Director Berkeley Wireless Research Center
University of California at Berkeley
2
JMRJMR--Patmos05Patmos05
Why Ultra-Low Voltage?
• Power and Energy Limiting Integration and Scaling
• Exploring the Bounds and Frontiers of Computation
• The Brave New World of Ubiquitous Electronics
Meso-scale low-cost wireless transceivers for ubiquitous wireless data acquisition that• are fully integrated
– Size smaller than 1 cm3
• are dirt cheap (“the Dutch treat”) – At or below 1$
• minimize power/energy dissipation– Limiting power dissipation to 100 µW
enables energy scavenging
• and form self-configuring, robust, ad-hoc networks containing 100’s to 1000’s of nodes
Meso-scale low-cost wireless transceivers for ubiquitous wireless data acquisition that• are fully integrated
– Size smaller than 1 cm3
• are dirt cheap (“the Dutch treat”) – At or below 1$
• minimize power/energy dissipation– Limiting power dissipation to 100 µW
enables energy scavenging
• and form self-configuring, robust, ad-hoc networks containing 100’s to 1000’s of nodes
3
JMRJMR--Patmos05Patmos05
Why Worry about Ultra-Low Voltage?
• Maximum integration density ultimately limited by energy dissipated per unit volume.
• Technology scaling leads to linear increase in energy density (for same switching activity and voltage)
• Only options:
– Reduce computational density – lowering frequency and/or activity
– Reduce supply voltage
4
JMRJMR--Patmos05Patmos05
0.1
1
10
100
1000
1 10
Po
wer
den
sity
: p
[W
/cm
2 ]
Design rule [µm]0.11
Scaling variable: κ
∝ κ3
10000
∝ κ0.7
MPU DSP
p = pDYNAMIC + pLEAK
Constant V scaling
→ pDYNAMIC ∝ κ3
V scaled as κ−1
IDS ∝ (VGS-VTH)1.3
→ pDYNAMIC ∝ κ0.7
(Sakurai, 2003)
Power and Energy Limiting IntegrationThe Picture of Old
5
JMRJMR--Patmos05Patmos05
Power and Energy Limiting IntegrationThe Roadmap Perspective
1
10
100
1000
Active power density: k1.7
Leakage power density: k3.4
2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
Compute density: k3
2003 ITRS – Low operating power scenario
6
JMRJMR--Patmos05Patmos05
The Reality May Be Worse!
1st Mega Trend: Slowing VCC / Growing Power
Technology Supply Voltage
1
10
100
1970 1980 1990 2000 2010
Tec
hnol
ogy
Vol
tage
(V
)
Voltage Scaling slowing
after P1262
.7X Voltage Scaling
Voltage scaling is slowing / stopping ~1.0V
1st Mega Trend: Slowing VCC / Growing Power
Technology Supply Voltage
1
10
100
1970 1980 1990 2000 2010
Tec
hnol
ogy
Vol
tage
(V
)
Voltage Scaling slowing
after P1262
.7X Voltage Scaling
Voltage scaling is slowing / stopping ~1.0V
Scott Thompson, TI Fellows meeting 2004.Scott Thompson, TI Fellows meeting 2004.Scott Thompson, TI Fellows meeting 2004.
7
JMRJMR--Patmos05Patmos05
Power and Energy Limiting Integration
Better Option: Slow down or reverse compute density increase– Use slack to control power (that is, voltage) and
leakage
1
10
100
2002 2004 2006 2008 2010 2012 2014 2016 2018 2020
Active power density: <k0.7
Compute density: k2
Leakage power density: <k1.4
8
JMRJMR--Patmos05Patmos05
There is Room:Minimum Operational Voltage of Inverter
• Swanson, Meindl (April 1972)
• Further extended in Meindl (Oct 2000)
Limitation: gain at midpoint > -1
Cfs: fast surface state capacitanceCox: gate capacitanceCd: diffusion capacitance
For ideal MOSFET (60 mV/decade slope):
9
JMRJMR--Patmos05Patmos05
Gain is the Limiting Factor
Voltages normalized to UT = kT/qVoltages normalized to UT = kT/q
From E. Vittoz, Ch. 16, Low Power Electronics, Ed. C. Piguet, 2005. From E. Vittoz, Ch. 16, Low Power Electronics, Ed. C. Piguet, 2005.
10
JMRJMR--Patmos05Patmos05
Confirmed for Current Technologies
Min Vdd (inverter)
0
10
20
30
40
50
60
70
80
90
100
0 0.5 1 1.5 2 2.5 3
pn ratio
mV
Min Vdd (NOR)
0
20
40
60
80
100
120
0 0.5 1 1.5 2 2.5 3
pn ratio
mV
both inputs one input
90 nm CMOS (simulation – nominal process parameters)90 nm CMOS (simulation – nominal process parameters)
Degradation due to asymmetryDegradation due to asymmetry
Source: M. Stan, L. Alarcon
For n =1.6,Vddmin = 1.9 kT/q = 48 mV
11
JMRJMR--Patmos05Patmos05
Minimum Energy per Operation
• Predicted by von Neumann: kTln(2)
• Based on previous result – moving one electron over Vddmin:
– Emin = QVDD/2 = q 2(ln2)kT/2q = kTln(2)
– Would be approximately three times larger for CMOS inverter with PMOS twice the size of NMOS
– At room temperature (300K): Emin = 0.29 10-20 J
• Minimum sized CMOS inverter at 90 nm operating at 1V
– E = CVdd2 = 0.8 10-15 J, or 5 orders of magnitude larger!
How Close Can We Get?How Close Can We Get?
12
JMRJMR--Patmos05Patmos05
Option 1: Subthreshold Operation
Making Leakage Work You!Making Leakage Work You!
Example: Energy-Aware FFTExample: Energy-Aware FFT
A. Wang, A.P. Chandrakasan, "A 180mV FFT Processor Using Subthreshold Circuit Techniques," ISSCC 2004.
13
JMRJMR--Patmos05Patmos05
FFT Energy-Performance Curves
2DDLSwitching VCaE ⋅⋅=
TVIE S
V
DDSLeakage
th
⋅⋅⋅=−
10
(estimated from switching and leakage models for a 0.18µm process)
Optim
al (Vdd , V
th )
Threshold Voltage (Vth)
Supp
ly V
olta
ge (
VD
D)
Courtesy: A. Wang, A. Chandrakasan, MITCourtesy: A. Wang, A. Chandrakasan, MIT
Minimum Energy Point @ VDD = 0.35V and VT = 0.475VMinimum Energy Point @ VDD = 0.35V and VT = 0.475V
14
JMRJMR--Patmos05Patmos05
SubThreshold FFT
Process Details
• 0.18µm CMOS process
• 6 layer metal
• 628k transistors
Data Memory
TwiddleROMs
ButterflyDatapath
Control logic
2.1
mm
2.6 mm
Data Memory
TwiddleROMs
ButterflyDatapath
Control logic
2.1
mm
2.6 mm
DataReady
DataOutput[1-0]
output clock
Operational down to 180 mV (fclock = 64 Hz)
Operational down to 180 mV (fclock = 64 Hz)
15
JMRJMR--Patmos05Patmos05
Confirmed by Measurements
200 300 400 500 600 700 800 9000
100
200
300
400
500
600
700
800
900
1000
200 300 400 500 600 700 800 900100Hz
1kHz
10kHz
100kHz
1MHz
10MHz
VDD(mV)
Clo
ck fr
eque
ncy
VDD(mV)
The FFT operates between VDD=180mV-900mV and clock frequency of 164Hz-6MHz.The minimum energy dissipated is 155nJ/FFT at 350 mV for a 1024-point 16b FFT. The clock frequency is 10kHz and the FFT processor dissipates 0.6µW.
measured
estimated
Ene
rgy
(nJ)
16
JMRJMR--Patmos05Patmos05
The Subliminal Processor (UMich)-14
I-Mem8-bit words
ROM8-bit words
Prefetch B
uffer32 bits
Reg File Acc
32 bits
Shifterx1
D-Mem
ALU
IF-STAGE
CONTROL LOGIC
ID-STAGE EX/MEM-STAGE
8 x 16 bits16 x 8 bits32 x 4 bits
81632
8-bit16-bit32-bit
8-bit words16-bit words32-bit words
81632
EventScheduler
ExternalInterrupts
I-Mem8-bit words
ROM8-bit words
Prefetch B
uffer32 bits
Reg File Acc
32 bits
Shifterx1
D-Mem
ALU
IF-STAGE
CONTROL LOGIC
ID-STAGE EX/MEM-STAGE
8 x 16 bits16 x 8 bits32 x 4 bits
8 x 16 bits16 x 8 bits32 x 4 bits
81632
81632
8-bit16-bit32-bit
8-bit16-bit32-bit
8-bit words16-bit words32-bit words
8-bit words16-bit words32-bit words
81632
81632
EventScheduler
ExternalInterrupts
• Explores Minimum-Energy Processor
– subthreshold operation
– 3pJ per instruction at 350mV operation
– 10X less energy than previously reported
– 41 year operation on 1g Li-ion battery
• Research Areas
– Processor architectural trade-offs
– Low voltage memory design
– Process variation tolerance
Courtesy: D. Blaauw, T. Austin, UMICH
17
JMRJMR--Patmos05Patmos05
Is Sub-threshold The Way to Go?
• Achieves lowest possible energy dissipation
• But … at a dramatic cost in performance
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
0 0.2 0.4 0.6 0.8 1Vdd (V)
tp(u
s)
130 nm CMOS
OPTIMAL POWER – PERFORMANCE TRADEOFF CURVE
Cycle time
Pow
er
18
JMRJMR--Patmos05Patmos05
Option 2: Managing Leakage while Reducing Thresholds
Courtesy: Mircea Stan, Louis Alarcon, UCB/Virginia
Stacked transistors enable aggressive threshold scaling
– Ion/Ioff increases with increasing stack height (leakage suppression)
– More robust to correlated (tune or adapt) and random variations (self-cancel)
– Decreased short channel effect
19
JMRJMR--Patmos05Patmos05
Impact of Stacking Devices
1 fJ
1 ns
Stack-depth 2
VDDVDD
VT
20
JMRJMR--Patmos05Patmos05
Impact of Stacking Devices
1 fJ
1 ns
Stack-depth 4
VDDVDD
VT
21
JMRJMR--Patmos05Patmos05
Impact of Stacking Devices
1 fJ
1 ns
Stack-depth 6
VDDVDD
VT
22
JMRJMR--Patmos05Patmos05
Impact of Stacking Devices
1 fJ
1 ns
Stack-depth 8
VDDVDD
VT
23
JMRJMR--Patmos05Patmos05
Optimal EDP, Energy, Delay vs. Stack
24
JMRJMR--Patmos05Patmos05
Complex GatesReducing thresholds while containing leakage
In2 In1 In0
F0 F1
The return of PLAs?• Regular
• Tunable
• NAND/NAND configuration
RootInput
A
B S
S
P0
to senseampA
B
B
B
Or pass-transistor logic• Current-steering
• Regular
• Balanced delay
• Programmable
25
JMRJMR--Patmos05Patmos05
Some ULV Challenges: (1) Excessive Timing Variance
0
10
20
30
40
50
60
70
80
0 0.2 0.4 0.6 0.8 1
Vdd (V)
σ/µ
(%)
• Timing variance increases dramatically with Vdd reduction
• Design for large yield means huge overhead at low voltages:
– Worst case design at 300mV means over 200% overkill
26
JMRJMR--Patmos05Patmos05
Managing Systematic Variations Through Self-Adaptation
ModuleTest
Module
Vbb
Test inputsand responses
Tclock
Vdd
Move test onto the chipDynamically adjust supply and threshold design parameters to center the design!
Courtesy: K. Cao, Arizona
5
10
15
20
25
30
35
40
45
50
1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07
Path Delay (ps)
Esw
itch
ing
(fJ) Adaptive Tuning
Worst Case, w/o Vth tuningNominal, w/ Vth tuning
Energy-performance trade-off
10x
27
JMRJMR--Patmos05Patmos05
Some ULV Challenges: (2) The Memory Data-Retention Voltage (DRV)
DRVV when , DD
inverterRight 2
1
inverterLeft 2
1 =∂∂=
∂∂
V
V
V
VV
DD
V1
M4
M3
M6M5
M2
M1
Leakagecurrent
V2
Leakagecurrent
VDDVDD
0 0
0 0.1 0.2 0.3 0.40
0.1
0.2
0.3
0.4
V1 (V)
2VTC1VTC2
VDD=0.18V
VDD=0.4V
VTC of SRAM cell inverters
V2
(V)
When Vdd scales down to DRV, the Voltage Transfer Curves (VTC) of the internal inverters degrade to such a level that Static Noise Margin (SNM) of the SRAM cell reduces to zero.
DRV Condition:
Source: Huifang Qin, QEDTI 2004
28
JMRJMR--Patmos05Patmos05
The Impact of Process Variations
DRV Spatial Distribution (256*128 Cells)
130 nm CMOS
100 200 300 4000
1000
2000
3000
4000
5000
6000
DRV (mV)
His
togra
m o
f 32K
SR
AM
cel
ls
29
JMRJMR--Patmos05Patmos05
Reducing the DRV
100 200 300 4000
1000
2000
3000
4000
5000
6000
DRV (mV)
His
togr
am
of 3
2K S
RAM
cel
ls
Option : ULV SRAM circuit optimization
SRAM Chip DRV
Solution II: Error-tolerant SRAM design (with Redundancy and ECC)
Combination of sizing and aggressive ECC reduces DRV to 150 mVCombination of sizing and aggressive ECC reduces DRV to 150 mV
30
JMRJMR--Patmos05Patmos05
How about Mixed-Signal?
• Reduced headroom challenges traditional mixed-signal design
• Process variation makes design centering tough
• Does further scaling help?Courtesy: R. Rutenbar, CMU
31
JMRJMR--Patmos05Patmos05
The Lure of the Sub-Threshold Region
10-4
10-3
10-2
10-1
100
101
102
103
10-1
100
101
102
g m /
I D
Inversion Coefficient (IC)10
-410
-310
-210
-110
010
110
210
310
6
107
108
109
1010
1011
f T (H
z)
• Greater transconductance (gm) for a given bias current
• Lower ft, but CMOS scaling helps
• Greater transconductance (gm) for a given bias current
• Lower ft, but CMOS scaling helps
32
JMRJMR--Patmos05Patmos05
Baseband Processor
Amplitude estimation: 10 bits
Timing estimation: 25 bits (worst case)
Synchronization Header
< 1 VPower Supply
200 µWTotal Power (Analog+Digital)
2.1mm x 2.1mm (pad limited)Chip Area
0.13um CMOSTechnology
Synchronization is done !
30mVData Input
Reset 1
Integration 1
Data Output(from Analog)
Data Output(from Digital)
Synchronization is done !
30mV
Synchronization is done !
30mV30mVData Input
Reset 1
Integration 1
Data Output(from Analog)
Data Output(from Digital)
Data Input
Reset 1
Integration 1
Data Output(from Analog)
Data Output(from Digital)
Courtesy: Yan-Mei Li, UCB, CICC 2005
33
JMRJMR--Patmos05Patmos05
Example: Energy-Efficient Data Conversion
Source: S. Gambini, UCB
.5VVdd
~2 uWPd
6 bitsResolution
800KS/sFs
Low-Voltage Low-PowerSuccessive-Approximation A/D
Simplest architecture wins!Simplest architecture wins!
34
JMRJMR--Patmos05Patmos05
ULV(I) RF?
Absolutely!• Aggressive use of passives• Unorthodox architectures to create gain (receiver) or increase efficiency (transmitter) at low voltage/current levels• Stacking of components often helps (current re-use)• Efficient oscillators are essential!
35
JMRJMR--Patmos05Patmos05
Extensive Use of (Innovative) Passives
• High Q-factor• Small form factor• MEMS/CMOS co-design• Integration into IC
process
Q > 1000
Ruby et. al. (Ultrasonics Symposium 2001)
Si SiAir
AirAlN
Electrodes Drive Electrode
Sense Electrode
100µm
FBAR
Carpentier et. al. (ISSCC 2005)
LNA
MixerBAW Filter
100M 1G 10G1
10
100
1000
Impe
danc
e (Ω
)
Frequency (Hz)
36
JMRJMR--Patmos05Patmos05
Exploring the Limits: (Almost) Passive 1.9GHz Receiver
• PRX=200nW
• BW-3dB=4MHz
• Sensitivity=-38dBm (12dB SNR)
• |S11|: -9.3dB
Courtesy: B. Otis, N. Pletcher
1mm
FBAR
37
JMRJMR--Patmos05Patmos05
Providing Gain: The Return of Super-Regenerative
• Super-regenerative receiver creates gain at low-current level
• Sub-threshold operation
• No external components (inductors, crystals, capacitors)
• 0.13µm CMOS
2mm
Total Rx: 380µW
1mm
Otis, Chee, Rabaey, ISSCC 2005
38
JMRJMR--Patmos05Patmos05
SuperRegenerative: Gain at Low Current
no signal-100dBm-90dBm-80dBmfq=100kHz -70dBm
39
JMRJMR--Patmos05Patmos05
SuperRegenerative: Gain at Low Current
ST 0.13mm CMOS
Detector oscillator transient:
• OOK modulation
• -80dBm, 5kbps
1
0
Eye
40
JMRJMR--Patmos05Patmos05
FBAR
C1 C2
M1
M2
Rb
Vdd
RmCmLm
Co Ro
LsRs
Low power design techniques• Complementary Gm stages to reduce Ibias
• Large Rb to reduce FBAR loading• Sub-threshold MOSFET maximizes gm/Id• Optimal choice of C1 and C2
Low-Voltage / Energy Oscillators
FBAR
Electrodes
Bond wires
CMOS Die
Y.H. Y.H. CheeChee, CICC 2005, CICC 2005
41
JMRJMR--Patmos05Patmos05
Low-Energy FBAR Oscillator - Measurements
FBAR oscillator phase noise at 90µW power consumption
FBAR oscillator voltage swing
(~140mV 0-pk @ 90 µW)
10k 100k 1M 10M-140
-130
-120
-110
-100
-90
Ph
ase
No
ise
(d
Bc/
Hz)
Frequency offset (Hz)
-98 dBc/Hz
-120 dBc/Hz
Instrument’s noise floor
10k 100k 1M 10M-140
-130
-120
-110
-100
-90
Ph
ase
No
ise
(d
Bc/
Hz)
Frequency offset (Hz)
-98 dBc/Hz
-120 dBc/Hz
Instrument’s noise floor
50 100 150 200 250 300100
150
200
250
300
Ze
ro t
o p
ea
k vo
ltag
e s
win
g (
mV
)
Power Consumption (µW)
42
JMRJMR--Patmos05Patmos05
Dealing with Variations
• Calibrate LC oscillator with high accuracy reference (FBAR)
• Convert control voltage to digital signal and control oscillator frequency digitally
• Turn off FBAR oscillator and control loop after calibration
High-accuracy (500ppm)
FBAR oscillator (300µW)
Low-accuracy LC
oscillator (<100µW)
To calibrate over 200MHz span better than 500kHz accuracy, 400 steps (9 bits) required
FBAR
oscPD LPF ADC
9
...
fFCLSB 1=
-115dBc/HzPhase noise
@ 1MHz offset
~400kHz
(9 bits)
Resolution
150MHzTuning Range
1.9GHzNominal frequency
100µWPower consumption
0.5VSupply voltage
Bondwire oscillator performance
Digitally Tuned VCO
0.13µm ST CMOS, (2x2)mm2 area
One bondwire and one integrated version implemented for comparison
Courtesy N. Pletcher, UCB, ESSCIRC 2005
44
JMRJMR--Patmos05Patmos05
Output Swing vs Bias Current
0
50
100
150
200
250
100 200 300 400 500 600 700
Core bias current (uA)
Diff
eren
tial V
ou
t (m
V,p
-p)
Bondwire Integrated
Low Accuracy LC Oscillator - Results
0.5V supply Minimum startup conditions:• Vdd = 0.3V• Ibias = 140µA
(42µW)
Tuning range (10 bit code)
45
JMRJMR--Patmos05Patmos05
Innovative Architectures: Injection Locked Transmitter
• Use LC power oscillator instead of a power amplifier
– Self-drive reduces driver power
• Capacitive bank to tune oscillation within locked range
• Reference oscillator to lock the power oscillator to an accurate carrier frequency
Input balun
Output balun
Bond wire inductor
CMOS Die
Y.H. Chee et al, CICC 2005Y.H. Chee et al, CICC 2005
46
JMRJMR--Patmos05Patmos05
TX Performance
• TX consumes an average power of 1.5mW while delivering 1mW OOK signal (32% efficiency).
• Degradation of TX efficiency due to driver stage (FBAR oscillator) is only 1%.
ST 0.13µm CMOS
Unlocked output spectrum Locked output spectrum
200 400 600 800 100020
22
24
26
28
30
32
Tra
nsm
itter
Eff
icie
ncy
(%)
Radiated Power (µW)
Vdd = 280mV Vdd = 260mV Vdd = 230mV Vdd = 210mV
47
JMRJMR--Patmos05Patmos05
Perspectives
There is plenty of room at the bottom!
• Further scaling of energy/operation (or current per function) is essential for scaling to produce its maximum impact
• Current digital gates 5 orders of magnitude from minimum
• Exciting opportunities offered by new paradigms in computing
• Innovations at circuit, architecture and system level are essential
• Ample opportunity still to tame some wild horses
The art of ingenuityH. De ManISSCC 05