looking beyond cmos: integrated circuit design with nano ...scaled nems vs. cmos adders for similar...
TRANSCRIPT
![Page 1: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/1.jpg)
Looking Beyond CMOS:
Integrated Circuit Design with
Nano Electro-Mechanical Switches
Vladimir Stojanović (MIT)
in collaboration with
Tsu-Jae King Liu, Elad Alon (UC Berkeley)
Dejan Marković (UCLA)
MIEL 2010
![Page 2: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/2.jpg)
Acknowledgements
Circuit design
Fred Chen, Hossein Fariborzi
Matthew Spencer, Abhinav Gupta
Cheng Wang, Kevin Dwan
Device design
Hei Kam, Rhesa Nathanael, Vincent Pott, Jaeseok Jeon
Sponsors
DARPA NEMS program
FCRP (C2S2, MSD)
MIT CICS
Berkeley Wireless Research Center
NSF
2
![Page 3: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/3.jpg)
CMOS is Scaling, Power Density is Not
Since ~2000 supply voltage (Vdd) stuck at ~1V
Leakage stops you from lowering threshold (Vth)
Vdd and Vt not scaling well power/area not scaling
400480088080
8085
8086
286386
486Pentium® proc
P6
1
10
100
1000
10000
1970 1980 1990 2000 2010Year
Po
we
r D
en
sit
y (
W/cm
2)
Hot Plate
Nuclear Reactor
Rocket Nozzle
S. Borkar, Intel
Sun’s Surface
Power Density Prediction circa 2000
Core 2
3
Performance limited by power, not number of transistors
![Page 4: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/4.jpg)
Parallelism to the Rescue
Parallelism allows slower, more efficient units
Helps scale performance for fixed power
Will this last forever?
4
Lower supply,
balanced Vth
Parallelize to recover performance
IBM CELL processor
4
![Page 5: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/5.jpg)
Subthreshold leakage: Game Over for CMOS
Leakage and sub-threshold slope define minimum
energy/op for CMOS
Parallelism cannot reduce power if already operating at
minimum energy
5
![Page 6: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/6.jpg)
NEM Switches to the Rescue
NEM switches show zero leakage & sharp sub-threshold slope
Could potentially enable reduced E/op with scaling
Measured MEM Switch I-V Curve MEM Switch Energy vs. VDD
6
R. Nathanael et al., “4-Terminal Relay Technology for Complementary Logic,” IEDM 2009
Etotal=Edynamic
![Page 7: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/7.jpg)
NEM Switch Structure and Operation
ON Switch:
|Vgb| > Vpi (pull-in voltage)
OFF Switch:
|Vgb| < Vpo (pull-out voltage)
7
Poly-SiGe Anchor
Poly-SiGe Beam
/Flexure
Tungsten Body
Tungsten Source/Drain
Tungsten Channel Poly-SiGe Gate
![Page 8: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/8.jpg)
NEM Switch as a Logic Element
8
DrainSource Body
Gate
4-terminal design mimics MOSFET operation Electrostatic actuation is ambipolar
Non-inverting logic is possible Actuation independent of source/drain voltages
8
![Page 9: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/9.jpg)
NEM Switch Model
9
Spring k Damper bMass m
Cgb
G
S D
Rcs Rcd
B
Cgc
CdbCsb
Anchor
Mechanical
Model
Rs Rd
Lumped Verilog-A model for circuit design and simulation:
Mechanical dynamics: spring (k), damper (b), mass (m)
Electrical parasitics: non-linear gate-body (Cgb), gate-channel
(Cgc), and source/drain-body cap (Cs,db), contact (Rcs,d)
9
![Page 10: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/10.jpg)
NEM Switch Characteristics
1.E-07
1.E-06
0 5 10 15
tPI[s]
VDD [V]
Model
Experiment
L=50mm 40mm 14mm
Simple electro-mechanical model matches experiment well Mechanical “pull-in” delay scales with geometry and overdrive voltage
“Pull-in” voltage (threshold) scales with switch geometry
Constant E-field scaling Mechanical delay and VPI scale linearly
10
Model
*H. Kam et al., “Design and Reliability of a Micro-Relay Technology…,” IEDM 200910
![Page 11: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/11.jpg)
Digital Circuit Design with NEMS
CMOS: delay set by electrical time constant
Quadratic delay penalty for stacking devices
Buffer & distribute logical/electrical effort over many stages
NEMS: delay dominated by mechanical movement
Can stack ~100-200 devices before td,elec ≈ td,mech
So, want all to switch simultaneously
Implement logic as a single complex gate
NEMS: 12 switches
11
![Page 12: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/12.jpg)
Need to Compare at Block Level
Delay Comparison vs. CMOS
Single mechanical delay vs. several electrical gate delays
For reasonable load, NEMS delay unaffected by fan-out/fan-in
Area Comparison vs. CMOS
Larger individual devices
But often need fewer devices to implement same function
4 gate delays 1 mechanical delay
F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008
NEMS: 12 switches
12
![Page 13: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/13.jpg)
Example: NEMS Adder
Full adder cell:
12 NEMS vs. 24
transistors
XOR “free”
Complementary signals
avoid extra mechanical
delay (to invert)
NEMS all sized minimally
13
![Page 14: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/14.jpg)
N-bit NEMS Adder
Ripple carry configuration
Cascade full adder cells to
create larger complex gate
Stack N NEMS, but still
single mechanical delay
Performance gap to
CMOS shrinks to ~10x for
32-bit add
(10ns delay at 90nm)
14
![Page 15: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/15.jpg)
Scaled NEMS vs. CMOS Adders
For similar area: >9x lower E/op, >10x greater delay
*D. Patil et. al., “Robust Energy-Efficient Adder Topologies,” in Proc. 18th IEEE Symp.
on Computer Arithmetic (ARITH'07).
9x
10x
Energy/op vs. Delay/op across Vdd
30x less capacitance
Lower device Cg, Cd
Fewer devices
2.4x lower Vdd
No leakage energy
Compare vs.
Sklansky CMOS
adder*
F. Chen et al., “Integrated Circuit Design with NEM Relays,” ICCAD 2008
15
![Page 16: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/16.jpg)
Parallelism helps
16
Energy/op vs. Delay/op across Vdd & CL Can extend energy
benefit up to
GOP/s throughput
As long as
parallelism is
available
Area overhead
bounded
CMOS needs to be
parallelized at
some point too
16
![Page 17: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/17.jpg)
Contact Resistance
Low contact R
not critical
Good news for
reliability…
17
Energy/op vs. Delay/op across Vdd & CL
![Page 18: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/18.jpg)
NEMS Circuit Demonstration
Test Devices
Logic
Adders,
Compressors
Timing
Elements
Latches, FFs
Memory
SRAM, DRAM
I/O
ADC, DAC
18
F. Chen et al., “Demonstration of Integrated Micro-Electro-Mechanical Switch Circuits for
VLSI Applications,” ISSCC 2010.
![Page 19: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/19.jpg)
Things We Learned…
Layout Matters
Unbalanced current flow:
flexures burn up!
Parasitic capacitors: can
affect Vpi (DIBL-like effect)
19
![Page 20: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/20.jpg)
Measured Inverter VTC
VTC looks digital, suggests composability
20
![Page 21: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/21.jpg)
NEMS Latch Shows Composability
Designed as if we used MOSFETS, but that is not always optimal…
21
![Page 22: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/22.jpg)
Switch-based Carry Generation
Demonstrates propagate-generate-kill logic as a
single complex gate
22
![Page 23: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/23.jpg)
Measured DRAM Results
Simultaneous read and write
Read latency = tmech,on (decoder) + tmech,off (WLRD switch)
23
![Page 24: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/24.jpg)
Measured DAC Results
Resistive divider based DAC
2-bit thermometer coded output
Code = 0 0 “Vin”
Code = “Vin” 1 1
2424
![Page 25: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/25.jpg)
NEM switch scaling – next design
1um litho
Scaled NEMS size
20um x 20umNEM switch size
120um x 150um
0.25um litho
25
![Page 26: Looking Beyond CMOS: Integrated Circuit Design with Nano ...Scaled NEMS vs. CMOS Adders For similar area: >9x lower E/op, >10x greater delay *D. Patil et. al., “Robust Energy-Efficient](https://reader036.vdocuments.us/reader036/viewer/2022071502/6121c2fe899a3f569b2ec73e/html5/thumbnails/26.jpg)
Conclusions
NEMS unique features enable scaling post-CMOS
Nearly ideal Ion/Ioff
Switching delay largely independent of electrical
Need to adapt circuit design style
Reliability improving
Circuit level insights critical (contact R)
Demonstrated simple circuits
Can start thinking about building more complex systems
Potentially order of magnitude lower E/op than CMOS
Next steps: scaling and improved device design, testing larger
digital blocks
26
t