a 180mv fft processor using subthreshold circuit techniques
TRANSCRIPT
A 180mV FFT Processor Using Subthreshold Circuit Techniques
Alice Wang and Anantha ChandrakasanMassachusetts Institute of Technology
Extreme Sensor NetworkingEmerging Sensor Applications
Enabler: Self-Powered Sensor System
Sensor&
A/D
SensorSpecific
Cores(e.g., FFT)
SensorDSP
ProcessorRF
Energy Scavenger
Operating Room of the Future(courtesy John Guttag)
Machine Monitoring(courtesy ABB)
Target Tracking & Detection(Courtesy of ARL)
System Power < 10µµµµW for Energy Scavenging
Design Considerations! For emerging low-performance microsensor applications,
computing speed is not critical. Energy dissipation per function must be minimized.
! Traditional low-power design is optimized for the worst-case operating scenario.
! Significant diversity in operating scenarios:! Operating modes: threshold detection (low-activity), source
detection (medium-activity), localization and classification (high-activity)
! Event statistics! User-specified latency and quality
! The node must be energy aware and able to adapt energy consumption over a variety of operating scenarios.
Energy Aware FFT Architecture
! Energy aware FFT architecture scales gracefully from 128 to 1024 point lengths and supports 8b and 16b precision.
W=e-j2ππππkn/N
Twiddle ROM’s
Butterfly Datapath
A
BW
X=A+B*W
Y=A -B*W
clk
W
Y
X
AB
Waddress
Aaddress, Baddress
dataready
clk
enable
FFT length
dataout datain clk bit precision
Data MemoryBank #1: Parity Odd
Bank #2: Parity Even
Bank #3: Parity Odd
Bank #4: Parity Even
MSB
=1M
SB=0
Con
trol
Log
ic
Bit-scalable Baugh-Wooley MultiplierX{15:0}
Adder used only in16-bit mode
Adder used in 8-bitand 16-bit mode
Y{7:0}
X{7:0}
0
00000000
00
00
00
0
Y{15:0}1
Z{31:0}! Fine-grained gating reduces activity factor and achieves
energy savings with minimal area overhead.! Bit-precision scaling architectures are used in the
butterfly datapath, data memory and Twiddle ROMs.
Variable FFT Length
! Dedicated memory structure contains an MSB and parity-bit crossbar to avoid read/write hazards.
! The energy aware control logic scales the number of butterflies with FFT length.
Memory Read
MSB=0Aaddress
Baddress
MSB=1
Parity Even128x32b
Parity Even128x32b
A B
ParityBit
Parity Odd128x32b
Parity Odd128x32b
FFT Energy/Performance Contours
2DDLSwitching VCaE ⋅⋅=
TVIE SV
DDSLeakage
th
⋅⋅⋅=−
10
! The optimal VDD for the 1024-point, 16b FFT is estimated from switching and leakage models for a 0.18µm process.
Optim
al (Vdd , V
th )
Threshold Voltage (Vth)
Supp
ly V
olta
ge (V
DD)
Exploit Subthreshold Operation for Sensor Circuits
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
200
400
600
800
1000
1200
1400
Optimum Power Supply
Estimated minimum energy point
@ VDD=400mV
Vth = 450mV
VDD
Ene
rgy(
nJ)
! There is a trade-off between leakage and switching energy as frequency, VDD and activity factor is varied.
! The FFT design focuses on achieving supply voltages well below 400mV to investigate the minimum energy point.
Min-Max Sizing Curve
50 100 150 2000
10
20
30
40
50
60
VDD
(mV)
Wp (
µm)
Wp(max)
Wp(min)
100 150 200 250 300 350 4000
10
20
30
40
50
60
VDD
(mV)
Wp (
µm)
Wp(max)SF corner
Wp(min)FS corner
Wp (max)
Inverter with a minimum sized Wn
Typical transistor
Process Corners
0 1Wp (min)
! The minimum supply voltage is limited by the effect of process variations.
! Inverter sizing analysis and minimum supply voltage analysis are performed at the corners.
drive currentleakage current
Tiny XOR at 100mV
idle currentdrive current
A=1, B=0, Z=1
B
Z
B
AA
Z
100
50
01m 2m 3m 4m0
A=1 B=0
A=0 B=1
A=0 B=0
A=1 B=1
! Leakage through the parallel devices causes the tiny XOR to fail at 100mV.
Voltage level at Z (mV)
Transmission Gate XOR
Z
B
B
A
B
A
idle currentdrive current
A=1, B=0, Z=1
weak drive current
50
0
Voltage level at Z (mV)
100
1m 2m 3m 4m0
Z
A=1 B=0
A=0 B=1
A=0 B=0
A=1 B=1
! Balanced number of devices reduces the effects of leakage and process variations.
Sneak Leakage and Stacked Devices
B
A
B
A
Sum
P
PB
Cin
Cin
Cin
A
A
! Traditional circuits suffer from effects such as parallel leakage, stacked devices, and sneak leakage paths.
parallel leakage
sneak leakagepath
idle currentdrive current
A=0, B=0, Cin = 0
stacked devices
Subthreshold Library Methodology
drive device gates
Cin
PA
A
B
B
B
SumP
Cin
A
A
B
B
B
! Buffering, reducing parallel devices, and driving device gates are methods used in subthreshold standard cell logic design.
add buffers
BA
B
A
Sum
P
P
B
B
Cin
Cin
Cin
A
AReduce parallel devices
Sizing Tradeoffs - SRAM cell
100 200 300 400 5000
10
20
30
40
50
VDD(mV)W
N1/W
P1
max N1FS corner
min N1SF corner
BL BL
HI LON3
P1
N1
P2
N2
N4
WL
Write condition trade-offWN3/WP1 large: write ‘0’ into HI at the SF cornerWN2/WP2 small: write ‘1’ into LO at the FS corner
BL BL
HI LON3
P1
N1
P2
N2
N4
WL
Increasing WN2 prevents the memory cell from being rewritten during a read access.
WL=1BL=1BL=1HI=1LO=0 ∆VLO
WL=1BL=0BL=1HI =1 ∆VHILO=0 ∆VLO
Tristate Write Access
WBLM
WWL
WWL
100 200 300 400 5000
10
20
30
40
50
60
VDD
Wp (
µm)
! Tristate latch-based write access achieves low voltage operation at process corners.
Wp(max)SF corner
Wp(min)FS corner
Read Bitline at 100mV
1m 1.5m0
50
100
RBL-(output-low)
RBL-(output-high)
Data dependent leakageWorst case output-high:
M0=0, M1-M127=1Worst case output-low:
M0=1, M1-M127=0
Precharge Read
ϕpre
RWL0
RWL1
RBL
Wpre
M0
M1
RWL0
RWL1
RWL2
M0
M1
M2
RBL
Tristate Read
2m0 4m 6m0
100
50 RWL0
RBL
RBL-(output-low)RWL0
RBL
Hierarchical-Read Bitline
A0
RBL
M0M1
M2M3
M126M127
A1 A2 A6
2m 4m 6m0
10080604020
0
M0=0, M1-M127=1, A1-A6=0
A0 RBL
ZM0
M1
A0
A0
A0
Mux
! The hierarchical-read bitline eliminates parallel leakage and stacked devices.
Latch-Write and Hierarchical-Read Memory
A0
RBL
A1
A2 A6
latch0
latch1
latch2
latch3
latch4
latch5
A0-
A6
WWL0
WWL0
! Muxes are daisy-chained for compact layout area.
WWL127
WWL127
WWL1
WWL1
Custom Subthreshold FFTProcess Details
! 0.18µm CMOS process! 6 layer metal! 628k transistors
Design Flow! Custom subthreshold
logic cells! Custom Skill-based
memory generators and multipliers
! Skill code place-and-route
Data Memory
TwiddleROMs
ButterflyDatapath
Control logic
2.1
mm
2.6 mm
! The FFT processor achieves 180mV operation for 16-bit, 1024-point operation. The clock frequency is 164 Hz.
180 mV Supply Demonstration
DataReady
DataOutput[1-0]
output clock
200 300 400 500 600 700 800 900
101
102
103
200 300 400 500 600 700 800 900
101
102
103
Energy-Scalability MeasurementsE
nerg
y (n
J)
Ene
rgy
(nJ)
1024 point
512 point
256 point
128 point
1024 point
512 point
256 point
128 point
8-bit processing 16-bit processing
! The FFT is able to operate at 128, 256, 512 and 1024-point FFT lengths and 8 and 16b precisions.
! 8b processing leads operation at a larger minimum VDDdue to reduced activity factor.
VDD(mV) VDD(mV)
200 300 400 500 600 700 800 9000
100
200
300
400
500
600
700
800
900
1000
Energy Estimation
200 300 400 500 600 700 800 900100Hz
1kHz
10kHz
100kHz
1MHz
10MHz
VDD(mV)
Clo
ck fr
eque
ncy
VDD(mV)
! The FFT operates between VDD=180mV-900mV and clock frequency of 164Hz-6MHz.
! The minimum energy dissipated is 155nJ/FFT at 350 mV for a 1024-point 16b FFT. The clock frequency is 10kHz and the FFT processor dissipates 0.6µW.
1024-point, 16 bit
measured
estimatedEne
rgy
(nJ)
Conclusions
! Subthreshold operation at the optimal supply voltage and clock frequency is necessary to minimize energy dissipation of digital circuits in wireless sensor applications.
! Process variations limit the minimum supply voltage operation of CMOS circuits.
! Subthreshold logic and memory design methodology minimizes parallel leakage, stacked devices and sneak leakage effects.
! Demonstrated a 180mV FFT Processor using subthreshold circuit techniques.