robust systems for scaled cmos and...

31
Robust Systems for Scaled CMOS and Beyond Acknowledgment: Students & Collaborators Subhasish Mitra Robust Systems Group Department of EE & Department of CS Stanford University

Upload: ngothuan

Post on 03-Apr-2018

226 views

Category:

Documents


3 download

TRANSCRIPT

Robust Systems

for Scaled CMOS and Beyond

Acknowledgment: Students & Collaborators

Subhasish Mitra

Robust Systems Group

Department of EE & Department of CS

Stanford University

Robust System Design

2

� Complexity: detect & fix design bugs

� CMOS reliability limits: tolerate errors

� Beyond silicon-CMOS: imperfection-immune logic

Perform correctly despite complexity & disturbances

What’s New ?

Traditional Thinking New approach

Design bugs Pre-silicon Post-silicon

Reliability failures Avoid Tolerate at low cost

Beyond silicon-CMOS

Material processingImperfection-immune

design

� Existing approaches: inadequate, expensive

3

Outline

� Introduction

� CMOS reliability limits: tolerate errors

� Beyond silicon-CMOS: imperfection-immune logic

� Conclusion

4

Technology Reliability Challenges

� System soft error rates increasing

� Fatal flip-flop errors

� Early-life failures (ELF)

� Burn-in: difficult, expensive

� Circuit aging & variations

� Worst-case guardbands expensive

Soft error rates

Comb. logic

5

Flip-flop

SRAM (no

ECC)

Circuit Failure Prediction

New failure signature � ultra low-cost

WearoutEarly-life failures (ELF)

Lifetime Time

Fa

ilure

ra

te

Burn-in difficult Iddq

ineffective

Circuit aging Guardbandsexpensive

Soft Error Resilience

BISER + LEAP:

Errors reduced: 2,000X

Software-orchestrated global optimization a MUST

Low-Cost Resilience

6

BISER: Built-In Soft Error Resilience

45nm: up to 1,000X fewer errors vs. D-flip-flop

D

C

D

C

Latch

Redundant Latch (Scan Test & Debug reuse)

Q

Q

Weak keeper

OUT

Combinational logicIN

ClockC-element

A

B

7

Single Error Assumption Inadequate

� Single event multiple upsets increasing

8

2,000X fewer errors vs. D-flip-flop

LEAP: Layout by Error Aware transistor Positioning

Optimized Resilience Essential

Select application-critical

flip-flops

Optimize for cross-layer

resilience

9

20 40 60 80 100

0%

10%

20%

Power cost

% critical flip-flops protected with logic parity, BISER for rest

Power cost

20 40 60 80 100

% flip-flops critical

Chip-level error rate

0.1

1

0.5

Circuit Failure Prediction

New failure signature � ultra low-cost

WearoutEarly-life failures (ELF)

Lifetime Time

Fa

ilure

ra

te

Burn-in difficult Iddq

ineffective

Circuit aging Guardbands expensive

Software-orchestrated global optimization a MUST

Low-Cost Resilience

10

Soft Error Resilience

BISER + LEAP:

Errors reduced: 2,000X

New Gate-Oxide ELF Signature

� Delay fluctuations over time

� Before functional failure

� Demonstrated: 45, 32nm

� 28, 22, 15nm in progress

� Enables

� On-line failure prediction

Stress time

Delay fluctuations

Fu

nc

tio

na

l F

ail

ure

ELF Delay fluctuations

11

On-line Failure Prediction

12

Failure Prediction Error Detection

Before errors appear After errors appear

+ No corruption – Corrupt data & states

+ Low cost – High cost

+ Self-diagnostics – Limited diagnostics

How ?

On-line self-test and diagnostics

On-Line Self-Test and Diagnostics

On-line self-test & diagnostics

CASP

High on-line test coverage

No visible system downtime

1% power, 1% area, 3% performance impact

Ultra low-cost

OpenSPARC T2 SoC

Task1

Task2

TaskN

Task N+1

TaskN+2

TaskM

13

Uncore very important

Outline

� Introduction

� CMOS reliability limits: tolerate errors

� Beyond silicon-CMOS: imperfection-immune logic

� Conclusion

14

Carbon Nanotube (CNT)Diameter (D) : 0.5 - 3 nm

D

S. Iijima

Carbon Nanotube FET (CNFET)

15

Ideal CNFET Inverter

N+ doped

Semiconducting

CNTs

Gates

Input

P+ doped

Semiconducting

CNTs Lithographic pitch

4nm

Output

Vdd

Gnd

16

CNFETs: BIG Promise, BUT

� Major barriers for a decade

� Mis-positioned CNTs

� Metallic CNTs

� Processing alone inadequate

Imperfection-immune

design essential

17Collaborator: Prof. H.-S.P. Wong, Stanford

Wanted: (A+C) (B+D)

Got : B+D

Out

A B

C D

Vdd

A C

B D

Gnd

Wanted: A′C′ + B′D′

Got: A′C′ + B′D′ + A′D′

Mis-positioned CNTs: Incorrect Logic

18

1. Grow CNTs

Mis-positioned-CNT-Immune NAND

19

BA

A

B

Out

1. Grow CNTs

2. Extended gate & contacts

CRUCIAL

20

Mis-positioned-CNT-Immune NAND

Vdd

Gnd

BA

A

B

Out

1. Grow CNTs

2. Extended gate & contacts

3. Etch gate & CNTs

4. Dope P & N regions

Vdd

Gnd

21

Mis-positioned-CNT-Immune NAND

BA

A

B

Out

1. Grow CNTs

2. Extended gate & contacts

3. Etch gate & CNTs

4. Dope P & N regions

Etched region

ESSENTIAL

Vdd

Gnd

22

Mis-positioned-CNT-Immune NAND

BA

A

B

Out

1. Grow CNTs

2. Extended gate & contacts

3. Etch gate & CNTs

4. Dope P & N regions

Etched region

ESSENTIAL

� Graph algorithms

� All possible functions

� VLSI

� Processing & design

Vdd

Gnd

23

Mis-positioned-CNT-Immune NAND

VMR: VLSI Metallic CNT Removal

� Metallic-CNT-immune design

☺ Sufficient: all possible logic designs

☺ VLSI processing & design

24

25

Quartz wafer with catalyst

Aligned CNT growth

First Wafer-Scale Aligned CNT Growth

2 µm 2 µm

Before transfer

Quartz substrate

Quartz wafer

99.5% CNTs aligned

After transfer

SiO2/Si substrate

D-latch

Imperfection-immune circuits Arithmetic & storage

Adder sum

D-latch

VLSI Integration Wafer-scale & monolithic 3D

Conventional via, NOT TSV

First Experimental Demonstrations

26Multi-layer CNFET circuits

CNFET Variations Significant

27

CNFET Ion variations

Yield

Energy penalty

0%

Very

high

High (99%+)

Unique layouts+ Co-optimized processing

Naïve transistor upsizing

Low (0%)

Metallic CNT

induced

Grown CNT

density

OthersNo

design

change

Outline

� Introduction

� CMOS reliability limits: tolerate errors

� Beyond silicon-CMOS: imperfection-immune logic

� Conclusion

28

Thanks to my Research Group

29

Thanks to our Sponsors

30

Photo credits:Burn-in & test socket workshop, H. Dai, NEC, opensparc.net, Stanford

Concluding Remarks

� Derive failure signatures

� Utilize failure signatures

� Validate failure signatures

31

Enable

Nanotechnology Revolutions

&

A TRULY Better Tomorrow