self-repairing and self-calibration: a design/test …vlsi/courses/ee695kr/f...self-repairing and...

Self-Repairing and Self-Calibration: A Design/Test Strategy for Nano-scale

CMOS

Kaushik RoyS. Mukhopadhyay, H. Mahmoodi, A. Raychowdhury,

Chris Kim, S. Ghosh, K. Kang

School of Electrical and Computer Engineering,Purdue University, West Lafayette, IN

Process Variation in Nano-scale Transistors

Inter and Intra-die Variations

Device parameters are no longer deterministic

130nm

30%

5X0.90.9

1.01.0

1.11.1

1.21.2

1.31.3

1.41.4

11 22 33 44 55Norm alized Leakage (N orm alized Leakage ( IsbIsb ))

Nor

mal

ized

Fre

quen

cyN

orm

aliz

ed F

requ

ency

Source: Intel

10

100

1000

10000

1000 500 250 130 65 32

Technology Node (nm)

Mea

n N

umbe

r of D

opan

t Ato

ms

Source: Intel

Random dopant fluctuation

Delay and Leakage Spread

Process Variations: Failures, Test, Self-Calibration, and Self Repair

• Memories (Caches and Register Files)– Failure analysis– Updated March Test for process induced failures– Process/Defect tolerant caches– Self-Repairing SRAM’s– Leakage and delay sensors for self-repair

• Logic– Failure analysis in pipelines and robust pipeline design– Delay sensor to measure critical path delays and integrated

test generation for robust segment delay coverage• Updated scan chain logic

– FLH and FLS (First level hold and scan flip-flops) for low-power (dynamic and leakage) and efficient delay testing

• Temperature control to prevent thermal runaway during burn-in

SRAM Memory Cell Failure

AF: Access time failureRF: Read failure WF: Write failureHF: Hold failure

RF: Read failure

VR

VL

Flipping

WF: Write failure

Nowriting

VR

VL

Process variation:Device miss-match → Cell failure

Primary source:Random dopant fluctuation

→ Vth miss-match

Results in three types of failures

TACCESS > TMAXAF: Access time failure

Mechanisms of Parametric Failures

WLVR

VL

Volta

ge

Time ->

WL

VR

VL

Volta

ge

Time -> Time ->

VR VL

Volta

ge VDDH

Read Failure

Write Failure Hold Failure

Time ->

ΔMIN

WL

BL

BR

Access Failure

Volta

ge

Read Failure (RF)

( )TRIPRDREADRF VVPP >=

dlinNRdsatAXR II =),,(),,( DDTRIPRDTRIPRDdsatPLTRIPRDTRIPRDdsatNL VVVIgndVVI ≈

Solve for VREAD

Solve for VTRIPRD

( )[ ]222 and where,

)0(10

TRIPRDREADTRIPREAD VVZVVZ

ZTRIPRDREADRF FVVZPP

σσσηηη +=−=

−=>−≡=

0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.240

100

200

300

400

500

600

700

800

900

VREAD

[V]

# of

occ

uran

ce

Monte−CarloGaussian Model

Distribution of VREAD

V R = ‘ 0 ’

VDD

V L = ‘ 1 ’

BL

NL NR

PL PR

BR

WL

AXRAXL

Subthreshold leakage ( I sub)Gate leakage ( I gd )

Junction leakage ( I jn )

Write Failure (WF)

0 10 20 30 40 50 600

50

100

150

200

250

300

350

400

450

Twrite [ps]

# of

occ

uran

ce

Monte−CarloGaussian ModelNon−Central F Model

The write failures mostly comes from the long tail of the distribution. Non−Central F distribution matches the tail of the distribution better.

( )WLWRITEWF TTPP >=

dsAXLLoutdsPLLin

TRIPWRWR

TRIPWRWR

V

V LLoutLLin

LLL

WRITE

IIIIVVif

VVifVIVI

dVVCT

TRIP

DD

L ofout current , L intocurrent

)( ;

)(;)()(

)(

)()(

)()(

≈=≈=

⎪⎪⎩

⎪⎪⎨

⎧

≥∞

<−= ∫

)(1)()( WLWRTt

WRWRWRWF TFtdtfPWLWR

−== ∫∞

=

V R = ‘ 0 ’

VDD

V L = ‘ 1 ’

BL

NL NR

PL PR

BR

WL

AXRAXL

Subthreshold leakage ( I sub)Gate leakage ( I gd )

Junction leakage ( I jn )

TWL

Hold Failure (HF)

0 0.05 0.1 0.15 0.20.3

0.4

0.5

0.6

0.7

0.8

0.9

|ΔVt| [V]

VD

DM

in a

t HO

LD (

VH

OLD

[V])

MODELMEDICI

Variation in δVtNR

, δVt

NL, δVt

PL, δVt

PR

Variation in δVtNL

, δVtPL

Variation in δVtNR

, δVtPR

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

100

200

300

400

500

600

700

800

900

1000

VDDHmin

[V]

# of

occ

uran

ce


Variation of and Distribution of VDDHmin

( )HOLDDDHminHF VVPP >=

( ) ( )NRPRDDHminTRIPNLPLDDHminL VtVtVVVtVtVV δδδδ ,,,, = Solve for VDDHmin

)(1)()( HOLDVDDHminV

DDHminDDHminVDDHminHF VFvdvfPHOLD

−== ∫∞

VHOLD

Access Time Failure (AF)

−0.1 −0.05 0 0.05 0.135

40

45

50

55

60

65

70

75

|ΔVt| [V]

TA

CC

ES

S [p

s]

MODELMEDICI

ΔVt applied to AXR

only (δVt

AXR)

ΔVt applied to NR

only (δVt

NR)

20 30 40 50 60 70 800

100

200

300

400

500

600

700

800

900

1000

Taccess [ps]

# of

occ

uran

ce


Variation and distribution of TACCESS with variation in δVt

)( MAXACCESSAF TTPP >=

Distribution of TACCESS

∑=

−Δ

=−Δ

=

NiisubAXLdsatAXR

MINB

BLBRBRBL

MINBLBRACCESS II

CICIC

CCT

,..,1)(

)(1)()( MAXTACCESSTt

ACCESSACCESSTACCESSAF TFtdtfPMAXACCESS

−== ∫∞

=

TMAX

Basic Modeling Approach• Estimation of the mean and the variance of a function several independent

normal random variable

– Expand ‘f’ in Taylor series with respect to Vt1,…,Vt6 around their mean and consider up to 2nd order terms.

– Estimate the mean and the variance as:

• Estimation of the probability distribution function of Y – Assume a normal pdf with the estimated mean and variance.

• The δVt of each transistors are assumed to be independent normal random variables.

( )

( )∑

∑

=

=

∂∂=

∂∂+=

6,...,1

22

6,...,1

22261

)(

21),...,()(

iVtitiACCESS

iVtitiVtVtACCESS

VfTVariance

VffTMean

σ

σηη

( )LWVt 1ασ ∂

....))(/(),...,,(),...,,(

6,..,1621

621

+−∂∂+=

=

∑=

Vtii

titiVtVtVt

tttACCESS

VVffVVVfT

ηηηη

Estimation of Overall SRAM Cell Failure Probability (PF)

[ ] [ ]FFFFF HWRAPFailPP +++==

AF

WF

RF

HF

U

PASS Fail

PF1-PF

Estimation of Memory Failure Probability (PMEM)

• PCOL: Probability that any of the cells in a column fails• PMEM: Probability that more than NRC (# of redundant

columns) fail

( )∑+=

−−⎟⎟⎠

⎞⎜⎜⎝

⎛=−−=

COL

RC

COL

N

Ni

iNCOL

iCOL

COLMEM

NFCOL PP

iN

PPP1

1 and )1(1

PF

PCOL

PMEM Redundant Columns

Estimation of Yield

( )⎟⎟

⎠

⎞

⎜⎜

⎝

⎛−= ∑ INTER

INTERINTERINTERINTERMEM NVtWLPYield ,,1

NINTER : the total number of inter-die Monte-Carlo simulations (i.e. total number of chips)

Select inter-die parameters(L, W, Vt)

Find PF

Find PCOL

Find PMEM

Find Yield

NINTER times

Snapshot: Transistor Sizing and Yield

Proposed failure analysis assists SRAM designers to achieve maximum memory yield

WN

WA27%

76%

87%

65%

0%

20%

40%

60%

80%

100%

0.4 0.6 0.8 1

WA / WN

Yiel

d

Parametric Failures in SRAM Cell

σVt ≈ 30mv, using BPTM 45nm technology

Yield ≈ 33%

0

50

100

150

200

250

300

350

0 52 105

157

210

262

315

367

419

472

524

577

629

682

734

786

839

890

944

996

1049

Chi

p Cou

ntChi

p Cou

nt Fault statistics

Number of faulty cells (Number of faulty cells (NNFaultyFaulty--CellsCells))

BL BR

WL

NR

PR

NL

PLAXRAXL

High-Vt Low-Vt

• Transistor mismatch => Parametric failures– Read failure– Write failure – Access failure– Hold failure

Parametric Failures => Yield degradation

SRAM Failure Mechanisms and Logic Fault Models

• Deceptive read destructive faults are overlooked in conventional test sequences

• Hold failures not detectable in conventional test sequences

Random Dopant Fluctuations; W, L, Tox Variations

Instability in SRAM cells

Vt mismatch in Sense-amplifiers

Delay variations in address decoders

Hold Failure

Access Failure

Sense-amp Functional Failure

Write Failure

Flipping Read

Failure

Low Supply Data Retention Fault

Data Retention Fault

Random Read Fault

Incorrect Read Fault

Transition Fault Read

Destructive Fault

Deceptive Read

Destructive Fault

Logic Fault Models

Physical Failure Mechanisms

Circuit Level Deviations

Process Variations

Efficient Testing of SRAM*C

ontr

ibut

ions

Con

trib

utio

nsC

halle

nges

Cha

lleng

es

Fault Coverage ofFault Coverage ofExisting Test SequencesExisting Test Sequences

Test Time ofTest Time ofExisting Test SequencesExisting Test Sequences

Optimized Test SequencesOptimized Test Sequencesto improve fault coverageto improve fault coverage

Novel DFT CircuitNovel DFT Circuitto reduce test timeto reduce test time

Failures due to Process VariationsFailures due to Process Variations

** IEEE VLSI Test SymposiumIEEE VLSI Test Symposium, May 2005, May 2005

Test: Optimized March Test Sequence

+ Good fault coverage- Test time increases

( W0) ( R0 W1) ( R1 W0) ( R0 W1) (HOLD)

( R1 R1 W0) (HOLD) ( R0 R0)

⇑ ⇑ ⇓

⇓

1. Optimized March C-

+ Reduce the test time + Cover all the fault models induced by process variations- Not able to detect Transition Coupling Fault and Address Decoder

Fault

2. March Q ( W 0) (H O LD ) ( R 0 W 0 W 1 R 1)(H O LD ) ( R 1 W 1 W 0 R 0) ( R 0)⇓ ⇑

⇑ ⇓

March test sequence comparisonLogic Fault Models Conventional Test Sequences Proposed Sequences

March C- March B March SR Opt. March C- March Q

Address Decoder Fault + + - + -

Data Retention Fault - - + + +

Low Supply Data Retention Fault - - - + +

Stuck-at Fault + + + + +

Transition Fault + + + + +

Random Read Fault +- +- +- +- +-

Read Destructive Fault + + + + +

Deceptive Read Destructive Fault - - + + +

Incorrect Read Fault + + + + +

State Coupling Fault + - + + +

Disturb Coupling Fault + - + + +

Incorrect Read Coupling Fault + - + + +

Read Destructive Coupling Fault + - + + +

Transition Coupling Fault + +- + + +-

Test Time 10N 17N 14N 12N 10N

0

100m

200m

300m

400m

500m

600m

700m

800m

900m

WL

VR

VL

SE

BL

BR

Read Access Timing

Double Sensing: A Novel DFT Circuit to Reduce Test Time

• In order to reduce test time, double sensing is used to detect Deceptive Read Destructive Fault in one cycle

• The content of the cell flips after the WL is activated.

• No time left to detect the flipping since the WL is disabled soon after that.

• The WL can be extended so as for the flipping to show itself on bitlines

• Another parallel sense-amp is fired at the time the WL is dis-activated.

0

100m

200m

300m

400m

500m

600m

700m

800m

900m

WL

VR

VL

BR

SE1

SE2

BL

WL extended

The flipping

is sensed .

• However, the WL can NOT be extended too long for test since in normal mode the memory should still operates at the designed frequency.

0

100m

200m

300m

400m

500m

600m

700m

800m

900m

WL

VR

VL

BR

SE1

SE2

BL

Already discharged

bitline is pulled up

• Disturbance is generated during the test to accelerate the development of the bitline differential voltage.

• Then the required WL extension time is minimized.

Parametric Failures and Yield

Std. dev. of intraStd. dev. of intra--die variationdie variation

Failu

re P

roba

bilit

yFa

ilure

Pro

babi

lity

Access Failure

Hold Failure

Write FailureRead

Failure

PF

PCOL

PMEM Redundant Columns

Intra-die Variation => Cell, Column and Memory failure

( - )MEMP f Inter die Variation=

Inter-die Variation

SRAM die with a global shift in process parameter

# - #

Non Faulty DieYieldTotal Die

=

What is the effect of Inter-die variation on Parametric Failures?

Inter-die Variation and Cell Failure

• Inter-die shift in process parameter amplifies the failure due to intra-die variations.

Read (RF)

Hold (HF)

Write (WF)

Access (AF)RF/HF high AF/WF high

LVT HVTNom. Vt

Inter-die Variation and Memory Failure

Reg. AHigh RF/HF

Reg. BLow Failures

Reg. CHigh AF/WF

Col. Fail. Prob.

Cell. Fail. Prob.

Mem. Fail. Prob.

• Memory failure probabilities are high at high when inter-die (global) shift in process is high.

LVT HVTNom. Vt

How can we improve yield considering both inter-die and

intra-die variations?

Adaptive Repairing of SRAM Array

• Reduce the dominant failures at different inter-die corners to increase width of low failure region.

Region A Region B Region C

PMEM ≈1 PMEM ≈1

PMEM ≈0

Mem. Fail. Probability

Region C HVT Corner

Access & Write failures dominate

Reduce AF & WF

Region A LVT Corner

Read & Hold failures dominate

Reduce RF & HF

LVT HVTNom. Vt

How can we reduce the dominant failures at different

inter-die corners?

Body Bias and Parametric Failures

Write (WF)

Access (AF)

Hold (HF)

Read (HF)

Overall Cell Fail. Prob

• Proper body bias can reduce parametric failures–Forward bias reduces Access & Write failures –Reverse bias reduces Read & Hold failures

BLBL BRBR

WLWL

VDDVDD

GNDGNDVBBVBB

Adaptive Repair using Body Bias

• Reduce the dominant failures at different inter-die corners to increase width of low failure region.

Region A Region B Region C

PMEM ≈0

Mem. Fail. Probability

Region C HVT Corner

Access & Write failures dominate

Apply FBB

Region A LVT Corner

Read & Hold failures dominate

Apply RBB

LVT HVTNom. Vt

RBB FBBZBB

Self-Repair Technique in SRAM

Enhanced YieldEnhanced Yield

PostPost--Silicon Adaptive Repair Silicon Adaptive Repair

PrePre--Silicon Design of Circuit and ArchitectureSilicon Design of Circuit and Architecture

SRAM ArraySRAM Array

Separation of interSeparation of inter--die process corners die process corners –– Vt BinningVt Binning

Adaptive Repair using Body Bias Adaptive Repair using Body Bias

Self-Repairing SRAM

How we can identify the inter-die Vtcorner under a large random intra-

die variation ?

• Monitor circuit parameters e.g. delay and leakage–Effect of inter-die variation can be masked by that of intra-die variation

• Adding a large number of random variables reduces the effect of intra-die variation

1

2 2 2

1 1

1 & =>

n

ii

n nY X

Y Xi X Y Xi Xi i Y X

Y X

N NN

σ σμ μ μ σ σ σ

μ μ

=

= =

=

=> = = = = =

∑

∑ ∑

Vt Binning by Leakage Monitoring

ΔVt Inter = 100mV

ΔVt Inter = 0mV

ΔVt Inter = -100mV

ΔVt Inter = -100mV

ΔVt Inter = 0mV

ΔVt Inter = -100mV

Nom. Vt Low VtHigh Vt

BLBL BRBR

WLWL

VDDVDD

GNDGND

BypassSwitch

On-chipLeakageMonitor

CalibrateSignal

VDD

SRAMArray

Comparator

Bodybias Body-Bias

Generator

VoutV

REF1VREF2

Self-Repair using Leakage Monitoring

SRAM ARRAY

VOUT

VREF1

VREF2

HVT Nom. Vt

Nom

. Vt

LVT

LVT

Current Monitor Circuit

• On-chip monitoring of leakage of entire array

• Body-bias is generated based on leakage monitor output

• Leakage monitored is bypassed in normal operating mode

265KB SRAM with No Body-Bias

265KB Self-RepairingSRAM

Yield Enhancement using Self-Repair

• Self-Repairing SRAM using body-bias can significantly improve design yield.

Self-Repairing SRAM: Die Photo

64KBHigh-VtSRAM

64KBLow-VtSRAMSelf-Repair Circuit

36

Schematic and measurement of self-repair mechanisms

FBB signal

Body voltage (Forward Bias)

Body voltage (Zero bias

Measured waveform of body voltage for Forward bias with On-chip FBB generator

Measured waveform of body voltage for Reverse bias with On-chip RBB generator

Body voltage (Zero bias Body voltage

(Reverse Bias)

RBB signal

Bodybias generation logic (MUX switches are designed with level converter for negative bias, not shown here)

Schematic of the Self-repairing SRAM

37

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

10

20

30

40

50

60

70

80

90

Sensor Output [V]

# o

f o

ccu

ran

ce

Proposed design in HVT

Conv. design with diode load in HVT

Conv. design with diode load in LVT

Proposeddesign in LVT

0 10 20 30 40 50 60 700.6

0.8

1

1.2

1.4

1.6

Leakage Current [uA] −−−−>

Sen

sor

Ou

tpu

t [V

] −−

−>

Different symbols representdata from different chips

Current Sensor Circuit Measured current sensor output attached to the 64KB LVT array

sensor output with temperature and Vt change (simulation 0.13μm)

Effect of intra-die variation on sensor output (simulation 0.13μm)

0 10 20 30 40 50 60 70 80

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

Temperature [C]S

enso

r O

utp

ut

[V]

Proposed design with LVT

Conv. design with diode load in LVT

Proposed design with HVT

Conv. design with diode load in HVT

Design with diode load => Lower change in Vout with process change

Proposed design => Higher changre in Vout with process change

Design and Measurement of Current Sensor

Failure Probability in SRAM Memory Cells

10-4

10-3

10-2

10-1

1.0

10 20 30 40 50 60 70

Prob

abili

ty o

f Fai

lure

PAFPRFPWFPFault

σVth( mV)

MONTE CARLO simulationusing BPTM 45nm tech.

Intrinsic Fluctuation of Vth due to random dopant effectPFault = PAF U PRF U PWFIn 45nm technology σVth≈ 30mV → PFault > 1.0x10-3

Large number of faulty cells in nano-scale SRAM under process variation

Fault Statistics in 32K Cache

0

50

100

150

200

250

300

350

0 26 52 79 105

131

157

184

210

236

262

288

315

341

367

393

419

446

472

498

524

Chi

p C

ount

(Nch

ip)

Fault statistics

NFaulty-Cells


NFaulty-Cells = PFault X NCells (total number of cells in a cache)

Conv. Yield≈ 33.4%

Conventional 32K cache results in only 33.4% yieldNeed a process/fault-tolerant mechanisms to improve

the yield in memory

MONTE CARLO simulation of 1000 chips

Basic Cache Architecture

Index = Row Address + Column Address Multiple cache blocks are stored in a single row

Minimize delay, area, routing complexityColumn MUX selects one block

Basic Cache Architecture

# of Block in a Row 1 Block 2 Blocks 4 Blocks

Decoder Delay 0.086ns 0.085ns 0.084

Wordline Delay 0.069ns 0.075ns 0.128ns

Bitline to Q Delay 0.452ns 0.355ns 0.313ns

Total Delay 0.608ns 0.515ns 0.525ns

Energy 0.166nJ 0.181nJ 0.195nJ

For 32K cache best # of cache blocks/row = 2 We choose 4 blocks in a row for our design

Results in higher yield – 16.25% increase2% cache access penalty7% energy overhead

BPTM 45nm technology, 32KByte direct mapped cache

Fault-Tolerant Cache Architecture

BIST detects the faulty blocksConfig Storage stores the fault information

Idea is to resize the cache to avoid faulty blocks during regular operation

Faulty

Resizing the Cache

Force the column MUX to select a non-faulty blockin the same row if the accessed block is faulty

Controller alters thecolumn address

Config Storage is accessedin parallel with cache

Feeds the fault informationto controller

“00”

Handle large number of faults without significantly reducing the cache size

Mapping Issue

More than one INDEXare mapped to same

block

Include column address bits into

TAG bits

Resizing is transparent to processor → same memory address

Config Storage

One bit fault information per cache blockBits are determined by BIST at the time of testingAccessed using row address part of INDEXProvides the fault information of all the blocks in a cache row to controller

Cache32KByte

ConfigStorage

1Kbit

Blocks per row 4 x

Block Size 32Byte 4bit

4 bit fault information about 4 blocks stored in a single cache row

Accessed in parallel with cache

ControllerColumn address selection based on fault location

Accessed Column Address00 01 10 11

Forced Column Address↓ ↓ ↓ ↓

None 0000 00 01 10 113rd Block 0010 00 01 00 112nd &3rd Block 0110 00 00 11 111st, 2nd & 3rdBlock 1110 11 11 11 11All four Blocks 1111 NA NA NA NA

Faulty Blocks in Accessed Row

Fault Information by Config Storage

Based on 4 bits read from Config Storage controller alters the column address







One block in a row is faulty

Selects the first available non-faulty block e.g 3rd block → 1st block







Two blocks in a row is faulty

Selects two non-faulty blocks respectivelye.g 2nd block → 1st block

3rd block → 4th block







Three blocks in a row is faultyAll the blocks are mapped to non-faulty block, e.g 4th block

One non-faulty block in each row, this architecture can correct any number of faults

Energy, Performance, and Area Overhead of Config Storage and Controller

BPTM 45nm technology, 32KByte Cache, 1Kbit Config Storage

Energy and Performance

32KBCache

Config Storage &Controller

Delay (ns) 0.45 0.22

Area overhead NA 0.5%

Energy overhead NA 1.8%

Controller changes the column address before data reaches at column MUXDoes not affect the cache access timeNegligible energy and area overhead (excluding BIST)

Results: Pop (ECC, Redundancy and Proposed Scheme)

Pop: Probability that a chip with NFaulty-cells can bemade operational

Faults are randomly distributed across chipYield is defined as:

Each scheme add some extra storage spacePop includes the probability of having faults in these blocksTo consider area, yield is redefined as:

)(*)(1__

_

CellFaultychipCellFaultyN

opTot

chip NNNPN

YCellFaulty

∑=

schemetolerantfaultwithchip ____

schemeanywithoutchipchip

effchip A

AYY ___=

0.0

0.2

0.4

0.6

0.8

1.0

0 52 105 157 210 262 315 367 419 472 524

Pop

Redundency R = 32ECC 1bitProposed Architecture

NFaulty-Cells

105 Faulty cells

Prop. 65%

ECC6%

% of the chips with 105 faulty cells which can be saved by

Proposed scheme ~ 65% (high fault tolerant capability)ECC ~ 6% Redundancy ~ 0%

Results: Pop

Results: Pop

0.0

0.2

0.4

0.6

0.8

1.0

0 52 105 157 210 262 315 367 419 472 524

P op

r=0

r=1

r=2

r=3

r=4

Adding redundant rows (r) in config storage

NFaulty-Cells

Pop improves with redundant rows in config storage

r = 2 is optimum for 32K cache with 1Kbit configstorage

Results: Pop

0.0

0.2

0.4

0.6

0.8

1.0

0 52 105 157 210 262 315 367 419 472 524

P op

R=0 r=2R=8 r=2R=16 r=2

Proposed architecture

with redundancy

NFaulty-Cells

Adding redundant rows (R) in cache in proposed scheme improves the Pop further

(optimum is R =8 for 32K cache)

Effective Yield of 32K Cache

898989

77

87

0

20

40

60

80

100

0 1 2 3 4Redundent Rows in Config

Storage (r)

% Y

ield

Proposed Architecture

54

8989919389

77 77 77 77

4938

333333

0

20

40

60

80

100

0 8 16 24 32

Redundent Rows in Cache (R)%

Yie

ld

Proposed Architecture with r = 2ECCRedundency

ECC + Redundancy yield ~ 77%Proposed architecture + Redundancy yield ~ 93%

(with 2 blocks in a cache row yield ~ 80%)

Conv. Yield

Proposed Arch. Yield without

any Redundancy

Optimum r = 2

Fault Tolerant Capability

0

50

100

150

200

250

300

350

0 52 105 157 210 262 315 367 419 472 524

Chi

p C

ount

(Nch

ip) Fault statistics

Chips saved by the proposed + redundancy (R=8, r=2)

Chips saved by ECC + redundancy ( R=8)

NFaulty-Cells

More number of saved chipsas compare to ECC ECC fails to save

any chips

Proposed architecture can handle more number offaulty cells than ECC, as high as 419 faulty cellsSaves more number of chips than ECC for a given NFaulty-Cells

0

50

100

150

200

250

300

350

0 52 105

157

210

262

315

367

419

472

524

577

629

682

734

786

839

890

944

996

1049

Chi

p C

ount

(Nch

ip)

Fault statistics

NFaulty-Cells

Conv. Yield≈ 33.4%

Process Tolerance: Fault Statistics in 64K Cache


NFaulty-Cells = PFault X NCells (total number of cells in a cache)Conventional 64K cache results in only 33.4% yield

Need a process/fault-tolerant mechanisms to improve the yield in memory

Process-Tolerant Cache Architecture

BIST detects the faulty blocksConfig Storage stores the fault information

Resize the cache to avoid faulty blocks during regular operation

Fault Tolerant Capability

0

50

100

150

200

250

300

350

0 105 210 315 419 524 629 734 839 944 1049

Chi

p C

ount

(Nch

ip)

Fault statistics

Chips saved by the proposed + redundancy (R=8, r=3)

Chips saved by ECC + redundancy ( R=16)

NFaulty-Cells

More number of saved chipsas compare to ECC

ECC fails to save any chips

Proposed architecture can handle more number of faultycells than ECC, as high as 890 faulty cells with marginal perf loss

CPU Performance Loss

0.0

0.5

1.0

1.5

2.0

2.5

0 105 210 315 419 524 629 734 839

% C

PU P

erfo

rman

ce L

oss For a 64K cache

averaged over SPEC 2000 benchmarks

NFaulty-Cells

Increase in miss rate due to downsizing of cache

Average CPU performance loss over all SPEC 2000benchmarks for a cache with 890 faulty cells is ~ 2%

Register File: Self-Calibration using Leakage Sensing

C. Kim, R. Krishnamurthy, & K. Roy

Process Compensating Dynamic Circuit Technology

clk

. . .RS0 RS7

D0 D7

RS1

D1

LBL0

LBL1

N0

Keeper upsizing degrades average performance

Conventional Static Keeper

Process Compensating Dynamic Circuit Technology

3-bit programmable keeper

clk

. . .RS0 RS7

D0 D7

RS1

D1

LBL0

LBL1

N0

b[2:0]

W 2W 4Ws s s

Opportunistic speedup via keeper downsizing

C. Kim et al. , VLSI Circuits Symp. ‘03

5X reduction in robustness failing dies

0

50

100

150

200

250

0.7 0.8 0.9 1.0 1.1 1.2Normalized DC robustness

Num

ber o

f die

s

ConventionalThis work

Noise floor

saved dies

Robustness Squeeze

10% opportunistic speedup

0

50

100

150

200

250

300

0.8 0.9 1.0 1.1 1.2Normalized delay

Num

ber o

f die

s

ConventionalThis work

PCD μ = 0.90Conv. μ = 1.00

μ : avg. delay

Delay Squeeze

Process detection

Self-Contained Process CompensationFab

Assembly

Wafer test

Burn inPackage testCustomer

Leakage measurement

On-die leakage sensor

Program PCD using fuses

On-Die Leakage Sensor For Measuring Process Variation

C. Kim et al. , VLSI Circuits Symp. ‘04

83μm

73μ

m

current reference

comparators

current m

irrors

VBIASgen.

NMOS device

test interface

High leakage sensing gain Compact analog design sharing bias generators

68

Leakage Current Sensing Circuits

Susceptible to P/N skew and supply fluctuationLarge area due to multiple analog bias circuitsLimited leakage sensing gain

+- d0

IREFd0VBIAS

+-

VSEN

VDD/2

VSEN

T. Kuroda et al., JSSC, Nov. 1996 M. Griffin et al., JSSC, Nov. 1998

69

Single Channel Leakage Sensing Circuit

Basic principle: Drain induced barrier loweringLow sensitivity to P/N skew and supply fluctuation

IREF

M1 (saturation)

M2 (subthreshold)

+-

VBIAS+- d0

VREF

VSEN

70

PV Insensitive Current Reference (IREF)

• Sub-1V process, voltage compensated MOS current generation concept

• Reference voltage, external resistor not required• Scalable, low cost, flexible solution

2/0.4

2/0.4

6/0.4

6/0.4

1/1.6

18/0.4

18/0.4

18/0.4

6/0.4 2/0.4

6/0.8 6/0.8 4/0.8 4/0.8

IREF

Vt generation circuit Subtraction circuit

S. Narendra et al., VLSI Circuits Symp. 2001

71

PV Insensitive Bias Voltage (VBIAS)

8/0.4 1/0.4

96/0.2

2/0.2IREF

VBIAS )(2

1

WWlog=

qkT

(W1=96μm, W2=2μm)

E. Vittoz et al., JSSC, June 1979

• PTAT containing no resistive dividers• Based on weak inversion MOS characteristics• Desired output voltage achieved via sizing

72

Comparator

• 2-stage differential amplifier• Already designed IREF is used for bias current

4/0.4 4/0.4

8/0.48/0.4 8/0.4

4/0.44/0.4

+- =

(+)(-)

output

IREF

Subtraction circuit

73

PV Sensitivity of Designed IREF, VBIAS

• IREF variation < 4%, VBIAS variation < 2%• Under realistic process skews, ±100mV supply

voltage fluctuations

1.2V, 90nm CMOS, 80˚C

VDD (V)

1.00 0.991.000.99 1.00 1.00

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.1 1.2 1.3

IREFVBIAS

1.011.000.99 1.000.99 1.01

0.0

0.2

0.4

0.6

0.8

1.0

1.2

fast typical slow

Process skew

IREFVBIAS

74

Proposed Leakage Current Sensing

IREF

M1 (saturation)

M2 (sub-threshold)

+-

VBIAS+- d0

VREF

VSEN

PMOS M1

0 0.2 0.4 0.6 0.8 1 1.2

VSEN (V)

I ds

fasttypicalslow

0 0.2 0.4 0.6 0.8 1 1.2

VSEN (V)

I ds

fasttypicalslow

NMOS M2

1.2V, 90nm CMOS, 80˚C

75

Superimposed I-V Curves

• 1.9-10.2X higher VSEN swing than prior-art• Process-voltage insensitive design

0 0.2 0.4 0.6 0.8 1 1.2

VSEN (V)

I ds

slowtypicalfast

1.2V, 90nm CMOS, 80˚C

∆VSEN=0.93V

76

6-Channel Leakage Sensor Test Chip

IREF+-

VBIAS

-+

WP 2WP 3WP 4WP 6WP 9WP

WN

VSEN

1

VREF

Bubblerejection

circuit

V1 V2 V3 V4 V5 V6

V1V2V3

V1V4V5

V2V4V6

OUT[2]

OUT[1]

OUT[0]

WN WN WN WN WN

VSEN

2

VSEN

3

VSEN

4

VSEN

5

VSEN

6

Incremental mirroring ratio for multi-bit resolution leakage sensingShared bias generators compact designProcess-voltage insensitive IREF, VBIAS gen.

-+ -+ -+ -+ -+

77

Multi-Bit Resolution Leakage Sensing

Leakage level determined by comparing VSEN1 through VSEN6 with VREF6-channel leakage sensor gives 7 level resolution

0

0.2

0.4

0.6

0.8

1

1.2

fast typical slow

Volta

ge (V

) VSEN6VSEN5VSEN4VSEN3VSEN2VSEN1VREF

Process skew

1.2V, 90nm CMOS, 80˚C

78

Example: Operation at Fast Process Corner

IREF+-

VBIAS


WN

VSEN

1

VREF

Bubblerejection

circuit

V1 V2 V3 V4 V5 V6

V1V2V3V1V4V5V2V4V6

OUT[2]

OUT[1]

OUT[0]

WN WN WN WN WN

VSEN

2

VSEN

3

VSEN

4

VSEN

5

VSEN

6

1 1 1 1 0 0

1 1 1 1 0 1

00.20.40.60.8

11.2

fast typical slow

Volta

ge (V

)

VREF

1

0

1

-+ -+ -+ -+ -+ -+

Fast corner: output code ‘101’

79

Example: Operation at Typical Process Corner

IREF+-

VBIAS


WN

VSEN

1

VREF

Bubblerejection

circuit

V1 V2 V3 V4 V5 V6

V1V2V3V1V4V5V2V4V6

OUT[2]

OUT[1]

OUT[0]

WN WN WN WN WN

VSEN

2

VSEN

3

VSEN

4

VSEN

5

VSEN

6

1 0 0 0 0 0

1 0 1 1 1 1

00.20.40.60.8

11.2

fast typical slow

Volta

ge (V

)

VREF

0

1

0

-+ -+ -+ -+ -+ -+

Typical corner: output code ‘010’

current reference

comparators

current m

irrors

VBIASgen.

NMOS devices

test interface

Technology 90nm dual Vt CMOSVDD 1.2V

Resolution 7 levelsPower consumption 0.66 mW @80Cº

Dimensions 83 X 73 μm2

On-Die Leakage Sensor Test Chip

Output codes from leakage sensor

001 010 011 100 101 110 111

Leakage Binning Results

ConclusionStatistical Failure Analysis Helps Enhance Yield

Post Silicon Tuning/Calibration is Becoming Promising for Si Nano systems

Built-In Leakage/Delay Sensors Provide Information on Intra-Die Process Variations

self-repairing and self-calibration: a design/test …vlsi/courses/ee695kr/f...self-repairing and...

Documents