laboratory for nanointegrated...

© INS-UoU 2015 All rights reserved

University of Utah | P.-E. Gaillardon | 1

IFIP WG10.5 March 29th, 2017 – Lausanne, Switzerland

Laboratory for NanoIntegrated Systems

Pierre-Emmanuel Gaillardon Department of Electrical and Computer Engineering – University of Utah



Research Vision

My design approach Co-design across the VLSI abstraction layers

Tool

System Design

Device

Traditional approach

EDA tools

CMOS technology

Circuits

Emerging EDA tools

Advanced transistor technologies

Advanced memory technologies

Low-power nanotechnology-enabled

systems




systems

Emerging EDA tools



Path I Exploiting emerging memory technologies

in low power computing nanosystems



P.-E. Gaillardon et al, VLSI-SoC’12

RRAM: A Low-Power System Enabler

•  MIM structures –  Different switching mechanisms –  Different physical origins

•  Back-End-of-Line integration process

•  Interesting device properties –  Non-volatile storage (1-bit or multi-bit) –  The properties can be engineered according to the application

(Thresholds, resistance levels, aging, data retention, …) –  Radiation tolerant

Low-Power Logic-in-memory Applications

Al/TiO2/Al

Controlling the technology and its CMOS co-integration opens a path towards innovative low-power circuits and systems



RRAM Technological Developments

•  Material Innovations: Pt/TaOx/CrOy/Cr/Cu, Pt/Ti/HfO2/Pt •  Structural Innovations: Fences (better scalability) •  CMOS-RRAM co-integration

D. Sacchetto et al., CASM’13

BE TE

Fences

J. Sandrini et al., MNE’14, JME’15

CMOS chip

notch Carrier wafer

ReRAM arrays



Y

D0

D1

a) b)

S0

SN0

D2

D3

SN0

S0

S0

SN1

S1

S1

Y

D0

D1

D2

D3

1st stage

2nd stage 1st stage

2nd stage

Near-VT FPGA Operations –  FPGAs are power-hungry circuits –  RRAM-based MUX perf. do not degrade with VDD reduction –  Get the near-VT power reductions with no perf. Compromises

Low-power RRAM-based FPGAs FPGAs rely on Routing Multiplexers

–  Multiplexers based on pass-gates –  RRAM = Non-Volatile Switches –  Replacement of all the Pass-Gates

–  Non-Volatile Routing MUX –  Performance improvement Y

D0

D1

a) b)

S0

SN0

D2

D3

SN0

S0

S0

SN1

S1

S1

Y

D0

D1

D2

D3

1st stage

2nd stage 1st stage

2nd stage

P.-E. Gaillardon et al., VLSI-SoC’12

Area: 12% Delay: 26% Power: 81% UMC 0.18 µm2

VDD=1.2V MCNC Benchmarks

VTR Flow 7



In-memory Computing

Z

TE

BE

VTE,BE > Vth ! SET Operations ! Z = 1

VTE,BE < -Vth ! RESET Operations ! Z = 0

TE BE Z Zn 0 0 0 0 0 1 0 0

1 1 0 0

1 0 0 1

TE BE Z Zn 0 0 1 1 0 1 1 0

1 1 1 1

1 0 1 1

Zn = TE . BE’ Zn = TE + BE’

Zn = (TE . BE’) . Z’ + (TE + BE’) . Z = MAJ (TE, BE’, Z) RRAM devices act as MAJ operators!

(In-memory computing with native MIG support)



Extend this Path Further

Objectives

1- Bridge the gap between technology and design •  Develop a full-academic technology framework to

prototype chips •  Develop design centric memory stacks (3-terminal

RRAMs)

2- Build a ultra-low-power RRAM-based FPGA 3- Build an in-memory computing processor whose operations

are made within the memories




systems

Emerging EDA tools



Path II Exploiting functionality-enhanced transistors in

low power computing nanosystems



Ultimate Devices

Novel Conduction Properties

Ambipolar Conduction

n-type and p-type carriers

CONTROL IT

Gate Source Drain

Tri-gate aka FinFET

Source Drain Gate

Source Drain

Polarity gate

Control gate

Functionality-enhanced Transistors



DG#SiNWFET+logic+ CMOS+logic+Area = 4 Area = 18

3/2 3/2

3/2 3/2

3 3

3 3

CTRL gate

CG PG

S

D

CG S

D

PG = 0

p-FET

PG = 1 CG

S

D n-FET

Polarity gate

2-input XOR

Increase the functionality of the device rather than scaling it!

An Extension to Moore’s Law



High-Performance VT Control

40

30

20

10

0

I D [u

A]

1.21.00.80.60.40.20.0 VG [V]

Solid lines: Low Vt configurationDash lines: High Vt configurationVDS=1.2V

VPGS=VPGD=1.2VVCG=[0,1.2V]

'1'

'0'

'1'

'1'G

LVT NMOS

'1'

'0'

'1'

G'1'

'1'

'0'

'1'

'1'G

LVT NMOS HVT NMOS

40

30

20

10

0

I D [u

A]

1.21.00.80.60.40.20.0 VG [V]



VCG=VPGD=1.2VVPGS=[0,1.2V]

'1'

'0'

'1'

'1'G

LVT NMOS'1'

'0'

'0'

'0'G

LVT PMOS

40

30

20

10

0

I D [u

A]

1.21.00.80.60.40.20.0 VG [V]



VPGS=VPGD=0VCG=[0,1.2V]


'1'

'0'

'1'

G'1'

HVT NMOS'1'

'0'

G

'0''0'

HVT PMOS

40

30

20

10

0

I D [u

A]

1.21.00.80.60.40.20.0 VG [V]


VPGS=VCG=0VPGD=[0,1.2V]




'1'

'0'

'1'

'1'G

LVT NMOS'1'

'0'

'0'

'0'G

LVT PMOS

'1'

'0'

'1'

G'1'

HVT NMOS

Same ION ! Limited performance compromise

40

30

20

10

0

I D [u

A]

1.21.00.80.60.40.20.0 VG [V]


��

'1'

'0'

G

'0''0'

HVT PMOS

40

30

20

10

0

I D [u

A]

1.21.00.80.60.40.20.0 VG [V]






'1'

'0'

'1'

'1'G

LVT NMOS'1'

'0'

'0'

'0'G

LVT PMOS

'1'

'0'

'1'

G'1'

HVT NMOSHVT '1'

'0'

G

'0''0'

HVT PMOS

40

30

20

10

0

I D [u

A]

1.21.00.80.60.40.20.0 VG [V]






'1'

'0'

'1'

'1'G

LVT NMOS'1'

'0'

'0'

'0'G

LVT PMOS

'1'

'0'

'1'

G'1'

HVT NMOSLVT

NMOS

'1'

'0'

G

'0''0'

HVT PMOS

40

30

20

10

0

I D [u

A]

1.21.00.80.60.40.20.0 VG [V]






'1'

'0'

'1'

'1'G

LVT NMOS'1'

'0'

'0'

'0'G

LVT PMOS

'1'

'0'

'1'

G'1'

HVT NMOS

LVT

PMOS

'1'

'0'

G

'0''0'

HVT PMOS

40

30

20

10

0

I D [u

A]

1.21.00.80.60.40.20.0 VG [V]






'1'

'0'

'1'

'1'G

LVT NMOS'1'

'0'

'0'

'0'G

LVT PMOS

'1'

'0'

'1'

G'1'

HVT NMOS

HVT

J. Zhang et al, TED’14

The individual control of the PG regions brings additional knobs!



A new steep-SS device exploiting weak-impact ionization and positive feedback

By exploiting the same device structure and using the extra gates

10-14

10-13

10-12

10-11

10-10

10-9

10-8

10-7

10-6

Drain

Cur

rent

(A)

-1.0 -0.5 0.0 0.5 1.0VG (V)

VSBB=5VWfin=40nm

VDS=5V SSmin=3.4mV/dec VDS=4V SSmin=7.7mV/dec VDS=3V SSmin=44mV/dec VDS=2V SSmin=54mV/dec VDS=1V SSmin=61mV/dec

10-12

10-11

10-10

10-9

10-8

-0.470 -0.460 -0.450

forward sweepbackward sweep

(a)

80

60

40

20

0Su

bthr

esho

ld S

lope

(m

V/de

c)

10-13

10-12

10-11

10-10

10-9

10-8

Drain Current (A)

VDS=5V VDS=4V VDS=3V

VSBB=5VWfin=40nm

60mV/dec

(b)6 mV/dec over 5 decades of current

Down to SS of 3.4 mV/dec

Super Steep Subthreshold Slope Control

J. Zhang et al, IEDM’14



Reference CMOS 20.17µm², 0.35ns, 8.58µW

TIG Replacement 28.04µm², 0.37ns, 9.61µW

Compact Gate Design 26.66µm², 0.29ns, 9.57µW

Power techniques 26.66µm², 0.29ns, 8.72µW

Memories 23.47µm², 0.29ns, 6.98µW

BUT they bring advanced functionalities

A Technology Node Ahead!

0.150.2

0.250.3

8

100.3

0.35

0.4CMOS

Power (mW)Area (μm²)

Crit

ical

pat

h (n

s)

0.150.2

0.250.3

8

100.3

0.35

0.4CMOS

TIG


Crit

ical

pat

h (n

s)

0.150.2

0.250.3

8

100.3

0.35

0.4CMOS

TIG

Compact Gates


Crit

ical

pat

h (n

s)

0.150.2

0.250.3

8

100.3

0.35

0.4CMOS

TIG

Compact Gates

Low-Power


Crit

ical

pat

h (n

s)

0.150.2

0.250.3

8

100.3

0.35

0.4CMOS

TIG

Compact Gates

Low-Power

Memories


Crit

ical

pat

h (n

s)

TIG FETs are worse than MOS FETs (bigger and slower)

Implementation test-case 1024-bit Polar code decoder

22-nm tech. node



Objectives

1- Identify the best switching primitive 2- Create a universal transistor technology (LP, HP, Steep, RF, …) 3- 1000s transistor circuit demonstration

Device technology (with richer switching functions)

Circuit design

EDA tools

Architectural design

Application

Exploration EDA tools

Application profiling

Identification of the dominant macro functions

Identification of the best device switching primitive

V-cycle Model Applied to Nanosystems

Do we have a guarantee that the selected device technology is good?



Path III Exploiting novel EDA techniques in low

power computing nanosystems


systems

Emerging EDA tools





Logic Synthesis (Optimization) Challenges

•  Logic Synthesis is a technology supporter –  LS techniques derive from CMOS abilities -NAND/NOR/

MUX •  Many real-life applications contains different type of

functions intertwined (AND/OR, XOR) together –  LS heuristics target only one type of function for pragmatic

reasons

•  Logic Synthesis as a design enabler

BBDDs L. Amarù et al., DATE’13, DATE’14

Path1: Model comparator primitives (rather than switches)

MIG L. Amarù et al., DAC’14, DAC’15

Path2: Exploit more generic data structures



•  Majority logic is a powerful generalization of AND/ORs. Ex1: MAJ(a,b,c)=ab+ac+bc Ex2: MAJ(a,b,1)=a+b Ex3: MAJ(a,b,0)=ab

•  Unlocks optimization opportunities not apparent before.

AND

AND OR AND OR

OR

OR

OR

x0 x1

x2

x3 x4

f

MAJ

MAJ

x0 x1 x2

x3 x4

f

L. Amarù et al., DAC’14, DAC’15

Majority-Inverter Graphs



MIG advantages are remarkable after Tech. Mapping (ASIC) Logic Synthesis results in 22-nm

Std.-cell library = {MIN, XOR, XNOR, NAND, NOR, INV}

-(22%,14%,11%) delay, area, power

w.r.t AOIG-based synthesis

Depth minimization + Area recovery

MCNC suite

Novel LS techniques promising to push design efficiency! L. Amarù et al., DAC’14, DAC’15

Superiority of MIG vs. Standard Tech.



Biconditional Expansion-based LUTs

F

A

B

B,B,0,1A=BA≠B

F

(b)(a)

B,B,0,1

BBDD node

B A

BBDD representation

A B

F

A

B

B,B,0,1A=BA≠B

F

(b)(a)

B,B,0,1

B

Hardware support

Advantage w.r.t. a standard LUT? Strong power advantage •  1st level of MUXes are statically configured •  2nd level of MUXes activity is reduced thanks to the XORs •  MUX tree is not driven by SRAMs ! Buffering requirements reduced!

f(A,B)= A⊕B . f(B,B)+A⊕B . f(B,B)B,B,0,1 B,B,0,1

Let’s consider a 2-input function

P.-E. Gaillardon et al., FPGA’14, FPGA’15



Extend Further the Approach

From a logic synthesis perspective: What is the best data structure?

What are the best standard cells libraries?

From an architectural perspective: What is the best elementary block?

Would that fit to Deep Learning problems?

Objectives

1- Develop a universal EDA technique (targeting arithmetic and general logic) 2- Create more power-efficient FPGA architectures 3- Create power efficient systems for deep learning applications Strong opportunities towards IoT, embedded systems, medical, …



Acknowledgments

Physical Design

EDA Tools

Funding

Pr. Nanni De Micheli

Technology

Modeling

Architecture Design

Dr. Luca Amarù Mr. Winston Haaswijk Ms. Eleonora Testa

Dr. Somayyeh Rahimian Dr. Hassan Ghasemzadeh Mr. Xifan Tang Mr. Gain Kim Mr. Edouard Giacomin

Dr. Michele De Marchi Dr. Jian Zhang Mr. Maxime Thammasack

Mr. Giovanni Resta Mr. Jorge Romero Mr. Tom Becnel



Integrated Nanosystems Research Group Department of Electrical and Computer Engineering

MEB building – University of Utah – Salt Lake City – UT – USA

Thank you for your attention

Questions?

laboratory for nanointegrated...

Documents