laboratory for nanointegrated...
TRANSCRIPT
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 1
IFIP WG10.5 March 29th, 2017 – Lausanne, Switzerland
Laboratory for NanoIntegrated Systems
Pierre-Emmanuel Gaillardon Department of Electrical and Computer Engineering – University of Utah
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 2
Research Vision
My design approach Co-design across the VLSI abstraction layers
Tool
System Design
Device
Traditional approach
EDA tools
CMOS technology
Circuits
Emerging EDA tools
Advanced transistor technologies
Advanced memory technologies
Low-power nanotechnology-enabled
systems
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 3
Low-power nanotechnology-enabled
systems
Emerging EDA tools
Advanced transistor technologies
Advanced memory technologies
Path I Exploiting emerging memory technologies
in low power computing nanosystems
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 4
P.-E. Gaillardon et al, VLSI-SoC’12
RRAM: A Low-Power System Enabler
• MIM structures – Different switching mechanisms – Different physical origins
• Back-End-of-Line integration process
• Interesting device properties – Non-volatile storage (1-bit or multi-bit) – The properties can be engineered according to the application
(Thresholds, resistance levels, aging, data retention, …) – Radiation tolerant
Low-Power Logic-in-memory Applications
Al/TiO2/Al
Controlling the technology and its CMOS co-integration opens a path towards innovative low-power circuits and systems
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 5
RRAM Technological Developments
• Material Innovations: Pt/TaOx/CrOy/Cr/Cu, Pt/Ti/HfO2/Pt • Structural Innovations: Fences (better scalability) • CMOS-RRAM co-integration
D. Sacchetto et al., CASM’13
BE TE
Fences
J. Sandrini et al., MNE’14, JME’15
CMOS chip
notch Carrier wafer
ReRAM arrays
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 6
Y
D0
D1
a) b)
S0
SN0
D2
D3
SN0
S0
S0
SN1
S1
S1
Y
D0
D1
D2
D3
1st stage
2nd stage 1st stage
2nd stage
Near-VT FPGA Operations – FPGAs are power-hungry circuits – RRAM-based MUX perf. do not degrade with VDD reduction – Get the near-VT power reductions with no perf. Compromises
Low-power RRAM-based FPGAs FPGAs rely on Routing Multiplexers
– Multiplexers based on pass-gates – RRAM = Non-Volatile Switches – Replacement of all the Pass-Gates
– Non-Volatile Routing MUX – Performance improvement Y
D0
D1
a) b)
S0
SN0
D2
D3
SN0
S0
S0
SN1
S1
S1
Y
D0
D1
D2
D3
1st stage
2nd stage 1st stage
2nd stage
P.-E. Gaillardon et al., VLSI-SoC’12
Area: 12% Delay: 26% Power: 81% UMC 0.18 µm2
VDD=1.2V MCNC Benchmarks
VTR Flow 7
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 7
In-memory Computing
Z
TE
BE
VTE,BE > Vth ! SET Operations ! Z = 1
VTE,BE < -Vth ! RESET Operations ! Z = 0
TE BE Z Zn 0 0 0 0 0 1 0 0
1 1 0 0
1 0 0 1
TE BE Z Zn 0 0 1 1 0 1 1 0
1 1 1 1
1 0 1 1
Zn = TE . BE’ Zn = TE + BE’
Zn = (TE . BE’) . Z’ + (TE + BE’) . Z = MAJ (TE, BE’, Z) RRAM devices act as MAJ operators!
(In-memory computing with native MIG support)
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 8
Extend this Path Further
Objectives
1- Bridge the gap between technology and design • Develop a full-academic technology framework to
prototype chips • Develop design centric memory stacks (3-terminal
RRAMs)
2- Build a ultra-low-power RRAM-based FPGA 3- Build an in-memory computing processor whose operations
are made within the memories
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 9
Low-power nanotechnology-enabled
systems
Emerging EDA tools
Advanced transistor technologies
Advanced memory technologies
Path II Exploiting functionality-enhanced transistors in
low power computing nanosystems
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 10
Ultimate Devices
Novel Conduction Properties
Ambipolar Conduction
n-type and p-type carriers
CONTROL IT
Gate Source Drain
Tri-gate aka FinFET
Source Drain Gate
Source Drain
Polarity gate
Control gate
Functionality-enhanced Transistors
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 11
DG#SiNWFET+logic+ CMOS+logic+Area = 4 Area = 18
3/2 3/2
3/2 3/2
3 3
3 3
CTRL gate
CG PG
S
D
CG S
D
PG = 0
p-FET
PG = 1 CG
S
D n-FET
Polarity gate
2-input XOR
Increase the functionality of the device rather than scaling it!
An Extension to Moore’s Law
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 12
High-Performance VT Control
40
30
20
10
0
I D [u
A]
1.21.00.80.60.40.20.0 VG [V]
Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V
VPGS=VPGD=1.2VVCG=[0,1.2V]
'1'
'0'
'1'
'1'G
LVT NMOS
'1'
'0'
'1'
G'1'
'1'
'0'
'1'
'1'G
LVT NMOS HVT NMOS
40
30
20
10
0
I D [u
A]
1.21.00.80.60.40.20.0 VG [V]
Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V
VPGS=VPGD=1.2VVCG=[0,1.2V]
VCG=VPGD=1.2VVPGS=[0,1.2V]
'1'
'0'
'1'
'1'G
LVT NMOS'1'
'0'
'0'
'0'G
LVT PMOS
40
30
20
10
0
I D [u
A]
1.21.00.80.60.40.20.0 VG [V]
Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V
VPGS=VPGD=1.2VVCG=[0,1.2V]
VPGS=VPGD=0VCG=[0,1.2V]
VCG=VPGD=1.2VVPGS=[0,1.2V]
'1'
'0'
'1'
G'1'
HVT NMOS'1'
'0'
G
'0''0'
HVT PMOS
40
30
20
10
0
I D [u
A]
1.21.00.80.60.40.20.0 VG [V]
Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V
VPGS=VCG=0VPGD=[0,1.2V]
VPGS=VPGD=1.2VVCG=[0,1.2V]
VPGS=VPGD=0VCG=[0,1.2V]
VCG=VPGD=1.2VVPGS=[0,1.2V]
'1'
'0'
'1'
'1'G
LVT NMOS'1'
'0'
'0'
'0'G
LVT PMOS
'1'
'0'
'1'
G'1'
HVT NMOS
Same ION ! Limited performance compromise
40
30
20
10
0
I D [u
A]
1.21.00.80.60.40.20.0 VG [V]
Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V
��������������� ��������������������
'1'
'0'
G
'0''0'
HVT PMOS
40
30
20
10
0
I D [u
A]
1.21.00.80.60.40.20.0 VG [V]
Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V
VPGS=VCG=0VPGD=[0,1.2V]
VPGS=VPGD=1.2VVCG=[0,1.2V]
VPGS=VPGD=0VCG=[0,1.2V]
VCG=VPGD=1.2VVPGS=[0,1.2V]
'1'
'0'
'1'
'1'G
LVT NMOS'1'
'0'
'0'
'0'G
LVT PMOS
'1'
'0'
'1'
G'1'
HVT NMOSHVT '1'
'0'
G
'0''0'
HVT PMOS
40
30
20
10
0
I D [u
A]
1.21.00.80.60.40.20.0 VG [V]
Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V
VPGS=VCG=0VPGD=[0,1.2V]
VPGS=VPGD=1.2VVCG=[0,1.2V]
VPGS=VPGD=0VCG=[0,1.2V]
VCG=VPGD=1.2VVPGS=[0,1.2V]
'1'
'0'
'1'
'1'G
LVT NMOS'1'
'0'
'0'
'0'G
LVT PMOS
'1'
'0'
'1'
G'1'
HVT NMOSLVT
NMOS
'1'
'0'
G
'0''0'
HVT PMOS
40
30
20
10
0
I D [u
A]
1.21.00.80.60.40.20.0 VG [V]
Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V
VPGS=VCG=0VPGD=[0,1.2V]
VPGS=VPGD=1.2VVCG=[0,1.2V]
VPGS=VPGD=0VCG=[0,1.2V]
VCG=VPGD=1.2VVPGS=[0,1.2V]
'1'
'0'
'1'
'1'G
LVT NMOS'1'
'0'
'0'
'0'G
LVT PMOS
'1'
'0'
'1'
G'1'
HVT NMOS
LVT
PMOS
'1'
'0'
G
'0''0'
HVT PMOS
40
30
20
10
0
I D [u
A]
1.21.00.80.60.40.20.0 VG [V]
Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V
VPGS=VCG=0VPGD=[0,1.2V]
VPGS=VPGD=1.2VVCG=[0,1.2V]
VPGS=VPGD=0VCG=[0,1.2V]
VCG=VPGD=1.2VVPGS=[0,1.2V]
'1'
'0'
'1'
'1'G
LVT NMOS'1'
'0'
'0'
'0'G
LVT PMOS
'1'
'0'
'1'
G'1'
HVT NMOS
HVT
J. Zhang et al, TED’14
The individual control of the PG regions brings additional knobs!
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 13
A new steep-SS device exploiting weak-impact ionization and positive feedback
By exploiting the same device structure and using the extra gates
10-14
10-13
10-12
10-11
10-10
10-9
10-8
10-7
10-6
Drain
Cur
rent
(A)
-1.0 -0.5 0.0 0.5 1.0VG (V)
VSBB=5VWfin=40nm
VDS=5V SSmin=3.4mV/dec VDS=4V SSmin=7.7mV/dec VDS=3V SSmin=44mV/dec VDS=2V SSmin=54mV/dec VDS=1V SSmin=61mV/dec
10-12
10-11
10-10
10-9
10-8
-0.470 -0.460 -0.450
forward sweepbackward sweep
(a)
80
60
40
20
0Su
bthr
esho
ld S
lope
(m
V/de
c)
10-13
10-12
10-11
10-10
10-9
10-8
Drain Current (A)
VDS=5V VDS=4V VDS=3V
VSBB=5VWfin=40nm
60mV/dec
(b)6 mV/dec over 5 decades of current
Down to SS of 3.4 mV/dec
Super Steep Subthreshold Slope Control
J. Zhang et al, IEDM’14
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 14
Reference CMOS 20.17µm², 0.35ns, 8.58µW
TIG Replacement 28.04µm², 0.37ns, 9.61µW
Compact Gate Design 26.66µm², 0.29ns, 9.57µW
Power techniques 26.66µm², 0.29ns, 8.72µW
Memories 23.47µm², 0.29ns, 6.98µW
BUT they bring advanced functionalities
A Technology Node Ahead!
0.150.2
0.250.3
8
100.3
0.35
0.4CMOS
Power (mW)Area (μm²)
Crit
ical
pat
h (n
s)
0.150.2
0.250.3
8
100.3
0.35
0.4CMOS
TIG
Power (mW)Area (μm²)
Crit
ical
pat
h (n
s)
0.150.2
0.250.3
8
100.3
0.35
0.4CMOS
TIG
Compact Gates
Power (mW)Area (μm²)
Crit
ical
pat
h (n
s)
0.150.2
0.250.3
8
100.3
0.35
0.4CMOS
TIG
Compact Gates
Low-Power
Power (mW)Area (μm²)
Crit
ical
pat
h (n
s)
0.150.2
0.250.3
8
100.3
0.35
0.4CMOS
TIG
Compact Gates
Low-Power
Memories
Power (mW)Area (μm²)
Crit
ical
pat
h (n
s)
TIG FETs are worse than MOS FETs (bigger and slower)
Implementation test-case 1024-bit Polar code decoder
22-nm tech. node
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 15
Objectives
1- Identify the best switching primitive 2- Create a universal transistor technology (LP, HP, Steep, RF, …) 3- 1000s transistor circuit demonstration
Device technology (with richer switching functions)
Circuit design
EDA tools
Architectural design
Application
Exploration EDA tools
Application profiling
Identification of the dominant macro functions
Identification of the best device switching primitive
V-cycle Model Applied to Nanosystems
Do we have a guarantee that the selected device technology is good?
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 16
Path III Exploiting novel EDA techniques in low
power computing nanosystems
Low-power nanotechnology-enabled
systems
Emerging EDA tools
Advanced transistor technologies
Advanced memory technologies
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 17
Logic Synthesis (Optimization) Challenges
• Logic Synthesis is a technology supporter – LS techniques derive from CMOS abilities -NAND/NOR/
MUX • Many real-life applications contains different type of
functions intertwined (AND/OR, XOR) together – LS heuristics target only one type of function for pragmatic
reasons
• Logic Synthesis as a design enabler
BBDDs L. Amarù et al., DATE’13, DATE’14
Path1: Model comparator primitives (rather than switches)
MIG L. Amarù et al., DAC’14, DAC’15
Path2: Exploit more generic data structures
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 18
• Majority logic is a powerful generalization of AND/ORs. Ex1: MAJ(a,b,c)=ab+ac+bc Ex2: MAJ(a,b,1)=a+b Ex3: MAJ(a,b,0)=ab
• Unlocks optimization opportunities not apparent before.
AND
AND OR AND OR
OR
OR
OR
x0 x1
x2
x3 x4
f
MAJ
MAJ
x0 x1 x2
x3 x4
f
L. Amarù et al., DAC’14, DAC’15
Majority-Inverter Graphs
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 19
MIG advantages are remarkable after Tech. Mapping (ASIC) Logic Synthesis results in 22-nm
Std.-cell library = {MIN, XOR, XNOR, NAND, NOR, INV}
-(22%,14%,11%) delay, area, power
w.r.t AOIG-based synthesis
Depth minimization + Area recovery
MCNC suite
Novel LS techniques promising to push design efficiency! L. Amarù et al., DAC’14, DAC’15
Superiority of MIG vs. Standard Tech.
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 20
Biconditional Expansion-based LUTs
F
A
B
B,B,0,1A=BA≠B
F
(b)(a)
B,B,0,1
BBDD node
B A
BBDD representation
A B
F
A
B
B,B,0,1A=BA≠B
F
(b)(a)
B,B,0,1
B
Hardware support
Advantage w.r.t. a standard LUT? Strong power advantage • 1st level of MUXes are statically configured • 2nd level of MUXes activity is reduced thanks to the XORs • MUX tree is not driven by SRAMs ! Buffering requirements reduced!
f(A,B)= A⊕B . f(B,B)+A⊕B . f(B,B)B,B,0,1 B,B,0,1
Let’s consider a 2-input function
P.-E. Gaillardon et al., FPGA’14, FPGA’15
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 21
Extend Further the Approach
From a logic synthesis perspective: What is the best data structure?
What are the best standard cells libraries?
From an architectural perspective: What is the best elementary block?
Would that fit to Deep Learning problems?
Objectives
1- Develop a universal EDA technique (targeting arithmetic and general logic) 2- Create more power-efficient FPGA architectures 3- Create power efficient systems for deep learning applications Strong opportunities towards IoT, embedded systems, medical, …
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 22
Acknowledgments
Physical Design
EDA Tools
Funding
Pr. Nanni De Micheli
Technology
Modeling
Architecture Design
Dr. Luca Amarù Mr. Winston Haaswijk Ms. Eleonora Testa
Dr. Somayyeh Rahimian Dr. Hassan Ghasemzadeh Mr. Xifan Tang Mr. Gain Kim Mr. Edouard Giacomin
Dr. Michele De Marchi Dr. Jian Zhang Mr. Maxime Thammasack
Mr. Giovanni Resta Mr. Jorge Romero Mr. Tom Becnel
© INS-UoU 2015 All rights reserved
University of Utah | P.-E. Gaillardon | 23
Integrated Nanosystems Research Group Department of Electrical and Computer Engineering
MEB building – University of Utah – Salt Lake City – UT – USA
Thank you for your attention
Questions?