self-repairing and self-calibration: a design/test …vlsi/courses/ee695kr/f...self-repairing and...
TRANSCRIPT
Self-Repairing and Self-Calibration: A Design/Test Strategy for Nano-scale
CMOS
Kaushik RoyS. Mukhopadhyay, H. Mahmoodi, A. Raychowdhury,
Chris Kim, S. Ghosh, K. Kang
School of Electrical and Computer Engineering,Purdue University, West Lafayette, IN
Process Variation in Nano-scale Transistors
Inter and Intra-die Variations
Device parameters are no longer deterministic
130nm
30%
5X0.90.9
1.01.0
1.11.1
1.21.2
1.31.3
1.41.4
11 22 33 44 55Norm alized Leakage (N orm alized Leakage ( IsbIsb ))
Nor
mal
ized
Fre
quen
cyN
orm
aliz
ed F
requ
ency
Source: Intel
10
100
1000
10000
1000 500 250 130 65 32
Technology Node (nm)
Mea
n N
umbe
r of D
opan
t Ato
ms
Source: Intel
Random dopant fluctuation
Delay and Leakage Spread
Process Variations: Failures, Test, Self-Calibration, and Self Repair
• Memories (Caches and Register Files)– Failure analysis– Updated March Test for process induced failures– Process/Defect tolerant caches– Self-Repairing SRAM’s– Leakage and delay sensors for self-repair
• Logic– Failure analysis in pipelines and robust pipeline design– Delay sensor to measure critical path delays and integrated
test generation for robust segment delay coverage• Updated scan chain logic
– FLH and FLS (First level hold and scan flip-flops) for low-power (dynamic and leakage) and efficient delay testing
• Temperature control to prevent thermal runaway during burn-in
SRAM Memory Cell Failure
AF: Access time failureRF: Read failure WF: Write failureHF: Hold failure
RF: Read failure
VR
VL
Flipping
WF: Write failure
Nowriting
VR
VL
Process variation:Device miss-match → Cell failure
Primary source:Random dopant fluctuation
→ Vth miss-match
Results in three types of failures
TACCESS > TMAXAF: Access time failure
Mechanisms of Parametric Failures
WLVR
VL
Volta
ge
Time ->
WL
VR
VL
Volta
ge
Time -> Time ->
VR VL
Volta
ge VDDH
Read Failure
Write Failure Hold Failure
Time ->
ΔMIN
WL
BL
BR
Access Failure
Volta
ge
Read Failure (RF)
( )TRIPRDREADRF VVPP >=
dlinNRdsatAXR II =),,(),,( DDTRIPRDTRIPRDdsatPLTRIPRDTRIPRDdsatNL VVVIgndVVI ≈
Solve for VREAD
Solve for VTRIPRD
( )[ ]222 and where,
)0(10
TRIPRDREADTRIPREAD VVZVVZ
ZTRIPRDREADRF FVVZPP
σσσηηη +=−=
−=>−≡=
0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.240
100
200
300
400
500
600
700
800
900
VREAD
[V]
# of
occ
uran
ce
Monte−CarloGaussian Model
Distribution of VREAD
V R = ‘ 0 ’
VDD
V L = ‘ 1 ’
BL
NL NR
PL PR
BR
WL
AXRAXL
Subthreshold leakage ( I sub)Gate leakage ( I gd )
Junction leakage ( I jn )
Write Failure (WF)
0 10 20 30 40 50 600
50
100
150
200
250
300
350
400
450
Twrite [ps]
# of
occ
uran
ce
Monte−CarloGaussian ModelNon−Central F Model
The write failures mostly comes from the long tail of the distribution. Non−Central F distribution matches the tail of the distribution better.
( )WLWRITEWF TTPP >=
dsAXLLoutdsPLLin
TRIPWRWR
TRIPWRWR
V
V LLoutLLin
LLL
WRITE
IIIIVVif
VVifVIVI
dVVCT
TRIP
DD
L ofout current , L intocurrent
)( ;
)(;)()(
)(
)()(
)()(
≈=≈=
⎪⎪⎩
⎪⎪⎨
⎧
≥∞
<−= ∫
)(1)()( WLWRTt
WRWRWRWF TFtdtfPWLWR
−== ∫∞
=
V R = ‘ 0 ’
VDD
V L = ‘ 1 ’
BL
NL NR
PL PR
BR
WL
AXRAXL
Subthreshold leakage ( I sub)Gate leakage ( I gd )
Junction leakage ( I jn )
TWL
Hold Failure (HF)
0 0.05 0.1 0.15 0.20.3
0.4
0.5
0.6
0.7
0.8
0.9
|ΔVt| [V]
VD
DM
in a
t HO
LD (
VH
OLD
[V])
MODELMEDICI
Variation in δVtNR
, δVt
NL, δVt
PL, δVt
PR
Variation in δVtNL
, δVtPL
Variation in δVtNR
, δVtPR
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90
100
200
300
400
500
600
700
800
900
1000
VDDHmin
[V]
# of
occ
uran
ce
Monte−CarloGaussian Model
Variation of and Distribution of VDDHmin
( )HOLDDDHminHF VVPP >=
( ) ( )NRPRDDHminTRIPNLPLDDHminL VtVtVVVtVtVV δδδδ ,,,, = Solve for VDDHmin
)(1)()( HOLDVDDHminV
DDHminDDHminVDDHminHF VFvdvfPHOLD
−== ∫∞
VHOLD
Access Time Failure (AF)
−0.1 −0.05 0 0.05 0.135
40
45
50
55
60
65
70
75
|ΔVt| [V]
TA
CC
ES
S [p
s]
MODELMEDICI
ΔVt applied to AXR
only (δVt
AXR)
ΔVt applied to NR
only (δVt
NR)
20 30 40 50 60 70 800
100
200
300
400
500
600
700
800
900
1000
Taccess [ps]
# of
occ
uran
ce
Monte−CarloGaussian Model
Variation and distribution of TACCESS with variation in δVt
)( MAXACCESSAF TTPP >=
Distribution of TACCESS
∑=
−Δ
=−Δ
=
NiisubAXLdsatAXR
MINB
BLBRBRBL
MINBLBRACCESS II
CICIC
CCT
,..,1)(
)(1)()( MAXTACCESSTt
ACCESSACCESSTACCESSAF TFtdtfPMAXACCESS
−== ∫∞
=
TMAX
Basic Modeling Approach• Estimation of the mean and the variance of a function several independent
normal random variable
– Expand ‘f’ in Taylor series with respect to Vt1,…,Vt6 around their mean and consider up to 2nd order terms.
– Estimate the mean and the variance as:
• Estimation of the probability distribution function of Y – Assume a normal pdf with the estimated mean and variance.
• The δVt of each transistors are assumed to be independent normal random variables.
( )
( )∑
∑
=
=
∂∂=
∂∂+=
6,...,1
22
6,...,1
22261
)(
21),...,()(
iVtitiACCESS
iVtitiVtVtACCESS
VfTVariance
VffTMean
σ
σηη
( )LWVt 1ασ ∂
....))(/(),...,,(),...,,(
6,..,1621
621
+−∂∂+=
=
∑=
Vtii
titiVtVtVt
tttACCESS
VVffVVVfT
ηηηη
Estimation of Overall SRAM Cell Failure Probability (PF)
[ ] [ ]FFFFF HWRAPFailPP +++==
AF
WF
RF
HF
U
PASS Fail
PF1-PF
Estimation of Memory Failure Probability (PMEM)
• PCOL: Probability that any of the cells in a column fails• PMEM: Probability that more than NRC (# of redundant
columns) fail
( )∑+=
−−⎟⎟⎠
⎞⎜⎜⎝
⎛=−−=
COL
RC
COL
N
Ni
iNCOL
iCOL
COLMEM
NFCOL PP
iN
PPP1
1 and )1(1
PF
PCOL
PMEM Redundant Columns
Estimation of Yield
( )⎟⎟
⎠
⎞
⎜⎜
⎝
⎛−= ∑ INTER
INTERINTERINTERINTERMEM NVtWLPYield ,,1
NINTER : the total number of inter-die Monte-Carlo simulations (i.e. total number of chips)
Select inter-die parameters(L, W, Vt)
Find PF
Find PCOL
Find PMEM
Find Yield
NINTER times
Snapshot: Transistor Sizing and Yield
Proposed failure analysis assists SRAM designers to achieve maximum memory yield
WN
WA27%
76%
87%
65%
0%
20%
40%
60%
80%
100%
0.4 0.6 0.8 1
WA / WN
Yiel
d
Parametric Failures in SRAM Cell
σVt ≈ 30mv, using BPTM 45nm technology
Yield ≈ 33%
0
50
100
150
200
250
300
350
0 52 105
157
210
262
315
367
419
472
524
577
629
682
734
786
839
890
944
996
1049
Chi
p Cou
ntChi
p Cou
nt Fault statistics
Number of faulty cells (Number of faulty cells (NNFaultyFaulty--CellsCells))
BL BR
WL
NR
PR
NL
PLAXRAXL
High-Vt Low-Vt
• Transistor mismatch => Parametric failures– Read failure– Write failure – Access failure– Hold failure
Parametric Failures => Yield degradation
SRAM Failure Mechanisms and Logic Fault Models
• Deceptive read destructive faults are overlooked in conventional test sequences
• Hold failures not detectable in conventional test sequences
Random Dopant Fluctuations; W, L, Tox Variations
Instability in SRAM cells
Vt mismatch in Sense-amplifiers
Delay variations in address decoders
Hold Failure
Access Failure
Sense-amp Functional Failure
Write Failure
Flipping Read
Failure
Low Supply Data Retention Fault
Data Retention Fault
Random Read Fault
Incorrect Read Fault
Transition Fault Read
Destructive Fault
Deceptive Read
Destructive Fault
Logic Fault Models
Physical Failure Mechanisms
Circuit Level Deviations
Process Variations
Efficient Testing of SRAM*C
ontr
ibut
ions
Con
trib
utio
nsC
halle
nges
Cha
lleng
es
Fault Coverage ofFault Coverage ofExisting Test SequencesExisting Test Sequences
Test Time ofTest Time ofExisting Test SequencesExisting Test Sequences
Optimized Test SequencesOptimized Test Sequencesto improve fault coverageto improve fault coverage
Novel DFT CircuitNovel DFT Circuitto reduce test timeto reduce test time
Failures due to Process VariationsFailures due to Process Variations
** IEEE VLSI Test SymposiumIEEE VLSI Test Symposium, May 2005, May 2005
Test: Optimized March Test Sequence
+ Good fault coverage- Test time increases
( W0) ( R0 W1) ( R1 W0) ( R0 W1) (HOLD)
( R1 R1 W0) (HOLD) ( R0 R0)
⇑ ⇑ ⇓
⇓
1. Optimized March C-
+ Reduce the test time + Cover all the fault models induced by process variations- Not able to detect Transition Coupling Fault and Address Decoder
Fault
2. March Q ( W 0) (H O LD ) ( R 0 W 0 W 1 R 1)(H O LD ) ( R 1 W 1 W 0 R 0) ( R 0)⇓ ⇑
⇑ ⇓
March test sequence comparisonLogic Fault Models Conventional Test Sequences Proposed Sequences
March C- March B March SR Opt. March C- March Q
Address Decoder Fault + + - + -
Data Retention Fault - - + + +
Low Supply Data Retention Fault - - - + +
Stuck-at Fault + + + + +
Transition Fault + + + + +
Random Read Fault +- +- +- +- +-
Read Destructive Fault + + + + +
Deceptive Read Destructive Fault - - + + +
Incorrect Read Fault + + + + +
State Coupling Fault + - + + +
Disturb Coupling Fault + - + + +
Incorrect Read Coupling Fault + - + + +
Read Destructive Coupling Fault + - + + +
Transition Coupling Fault + +- + + +-
Test Time 10N 17N 14N 12N 10N
0
100m
200m
300m
400m
500m
600m
700m
800m
900m
WL
VR
VL
SE
BL
BR
Read Access Timing
Double Sensing: A Novel DFT Circuit to Reduce Test Time
• In order to reduce test time, double sensing is used to detect Deceptive Read Destructive Fault in one cycle
• The content of the cell flips after the WL is activated.
• No time left to detect the flipping since the WL is disabled soon after that.
• The WL can be extended so as for the flipping to show itself on bitlines
• Another parallel sense-amp is fired at the time the WL is dis-activated.
0
100m
200m
300m
400m
500m
600m
700m
800m
900m
WL
VR
VL
BR
SE1
SE2
BL
WL extended
The flipping
is sensed .
• However, the WL can NOT be extended too long for test since in normal mode the memory should still operates at the designed frequency.
0
100m
200m
300m
400m
500m
600m
700m
800m
900m
WL
VR
VL
BR
SE1
SE2
BL
Already discharged
bitline is pulled up
• Disturbance is generated during the test to accelerate the development of the bitline differential voltage.
• Then the required WL extension time is minimized.
Parametric Failures and Yield
Std. dev. of intraStd. dev. of intra--die variationdie variation
Failu
re P
roba
bilit
yFa
ilure
Pro
babi
lity
Access Failure
Hold Failure
Write FailureRead
Failure
PF
PCOL
PMEM Redundant Columns
Intra-die Variation => Cell, Column and Memory failure
( - )MEMP f Inter die Variation=
Inter-die Variation
SRAM die with a global shift in process parameter
# - #
Non Faulty DieYieldTotal Die
=
What is the effect of Inter-die variation on Parametric Failures?
Inter-die Variation and Cell Failure
• Inter-die shift in process parameter amplifies the failure due to intra-die variations.
Read (RF)
Hold (HF)
Write (WF)
Access (AF)RF/HF high AF/WF high
LVT HVTNom. Vt
Inter-die Variation and Memory Failure
Reg. AHigh RF/HF
Reg. BLow Failures
Reg. CHigh AF/WF
Col. Fail. Prob.
Cell. Fail. Prob.
Mem. Fail. Prob.
• Memory failure probabilities are high at high when inter-die (global) shift in process is high.
LVT HVTNom. Vt
How can we improve yield considering both inter-die and
intra-die variations?
Adaptive Repairing of SRAM Array
• Reduce the dominant failures at different inter-die corners to increase width of low failure region.
Region A Region B Region C
PMEM ≈1 PMEM ≈1
PMEM ≈0
Mem. Fail. Probability
Region C HVT Corner
Access & Write failures dominate
Reduce AF & WF
Region A LVT Corner
Read & Hold failures dominate
Reduce RF & HF
LVT HVTNom. Vt
How can we reduce the dominant failures at different
inter-die corners?
Body Bias and Parametric Failures
Write (WF)
Access (AF)
Hold (HF)
Read (HF)
Overall Cell Fail. Prob
• Proper body bias can reduce parametric failures–Forward bias reduces Access & Write failures –Reverse bias reduces Read & Hold failures
BLBL BRBR
WLWL
VDDVDD
GNDGNDVBBVBB
Adaptive Repair using Body Bias
• Reduce the dominant failures at different inter-die corners to increase width of low failure region.
Region A Region B Region C
PMEM ≈0
Mem. Fail. Probability
Region C HVT Corner
Access & Write failures dominate
Apply FBB
Region A LVT Corner
Read & Hold failures dominate
Apply RBB
LVT HVTNom. Vt
RBB FBBZBB
Self-Repair Technique in SRAM
Enhanced YieldEnhanced Yield
PostPost--Silicon Adaptive Repair Silicon Adaptive Repair
PrePre--Silicon Design of Circuit and ArchitectureSilicon Design of Circuit and Architecture
SRAM ArraySRAM Array
Separation of interSeparation of inter--die process corners die process corners –– Vt BinningVt Binning
Adaptive Repair using Body Bias Adaptive Repair using Body Bias
Self-Repairing SRAM
How we can identify the inter-die Vtcorner under a large random intra-
die variation ?
• Monitor circuit parameters e.g. delay and leakage–Effect of inter-die variation can be masked by that of intra-die variation
• Adding a large number of random variables reduces the effect of intra-die variation
1
2 2 2
1 1
1 & =>
n
ii
n nY X
Y Xi X Y Xi Xi i Y X
Y X
N NN
σ σμ μ μ σ σ σ
μ μ
=
= =
=
=> = = = = =
∑
∑ ∑
Vt Binning by Leakage Monitoring
ΔVt Inter = 100mV
ΔVt Inter = 0mV
ΔVt Inter = -100mV
ΔVt Inter = -100mV
ΔVt Inter = 0mV
ΔVt Inter = -100mV
Nom. Vt Low VtHigh Vt
BLBL BRBR
WLWL
VDDVDD
GNDGND
BypassSwitch
On-chipLeakageMonitor
CalibrateSignal
VDD
SRAMArray
Comparator
Bodybias Body-Bias
Generator
VoutV
REF1VREF2
Self-Repair using Leakage Monitoring
SRAM ARRAY
VOUT
VREF1
VREF2
HVT Nom. Vt
Nom
. Vt
LVT
LVT
Current Monitor Circuit
• On-chip monitoring of leakage of entire array
• Body-bias is generated based on leakage monitor output
• Leakage monitored is bypassed in normal operating mode
265KB SRAM with No Body-Bias
265KB Self-RepairingSRAM
Yield Enhancement using Self-Repair
• Self-Repairing SRAM using body-bias can significantly improve design yield.
Self-Repairing SRAM: Die Photo
64KBHigh-VtSRAM
64KBLow-VtSRAMSelf-Repair Circuit
36
Schematic and measurement of self-repair mechanisms
FBB signal
Body voltage (Forward Bias)
Body voltage (Zero bias
Measured waveform of body voltage for Forward bias with On-chip FBB generator
Measured waveform of body voltage for Reverse bias with On-chip RBB generator
Body voltage (Zero bias Body voltage
(Reverse Bias)
RBB signal
Bodybias generation logic (MUX switches are designed with level converter for negative bias, not shown here)
Schematic of the Self-repairing SRAM
37
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60
10
20
30
40
50
60
70
80
90
Sensor Output [V]
# o
f o
ccu
ran
ce
Proposed design in HVT
Conv. design with diode load in HVT
Conv. design with diode load in LVT
Proposeddesign in LVT
0 10 20 30 40 50 60 700.6
0.8
1
1.2
1.4
1.6
Leakage Current [uA] −−−−>
Sen
sor
Ou
tpu
t [V
] −−
−>
Different symbols representdata from different chips
Current Sensor Circuit Measured current sensor output attached to the 64KB LVT array
sensor output with temperature and Vt change (simulation 0.13μm)
Effect of intra-die variation on sensor output (simulation 0.13μm)
0 10 20 30 40 50 60 70 80
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Temperature [C]S
enso
r O
utp
ut
[V]
Proposed design with LVT
Conv. design with diode load in LVT
Proposed design with HVT
Conv. design with diode load in HVT
Design with diode load => Lower change in Vout with process change
Proposed design => Higher changre in Vout with process change
Design and Measurement of Current Sensor
Failure Probability in SRAM Memory Cells
10-4
10-3
10-2
10-1
1.0
10 20 30 40 50 60 70
Prob
abili
ty o
f Fai
lure
PAFPRFPWFPFault
σVth( mV)
MONTE CARLO simulationusing BPTM 45nm tech.
Intrinsic Fluctuation of Vth due to random dopant effectPFault = PAF U PRF U PWFIn 45nm technology σVth≈ 30mV → PFault > 1.0x10-3
Large number of faulty cells in nano-scale SRAM under process variation
Fault Statistics in 32K Cache
0
50
100
150
200
250
300
350
0 26 52 79 105
131
157
184
210
236
262
288
315
341
367
393
419
446
472
498
524
Chi
p C
ount
(Nch
ip)
Fault statistics
NFaulty-Cells
σVt ≈ 30mv, using BPTM 45nm technology
NFaulty-Cells = PFault X NCells (total number of cells in a cache)
Conv. Yield≈ 33.4%
Conventional 32K cache results in only 33.4% yieldNeed a process/fault-tolerant mechanisms to improve
the yield in memory
MONTE CARLO simulation of 1000 chips
Basic Cache Architecture
Index = Row Address + Column Address Multiple cache blocks are stored in a single row
Minimize delay, area, routing complexityColumn MUX selects one block
Basic Cache Architecture
# of Block in a Row 1 Block 2 Blocks 4 Blocks
Decoder Delay 0.086ns 0.085ns 0.084
Wordline Delay 0.069ns 0.075ns 0.128ns
Bitline to Q Delay 0.452ns 0.355ns 0.313ns
Total Delay 0.608ns 0.515ns 0.525ns
Energy 0.166nJ 0.181nJ 0.195nJ
For 32K cache best # of cache blocks/row = 2 We choose 4 blocks in a row for our design
Results in higher yield – 16.25% increase2% cache access penalty7% energy overhead
BPTM 45nm technology, 32KByte direct mapped cache
Fault-Tolerant Cache Architecture
BIST detects the faulty blocksConfig Storage stores the fault information
Idea is to resize the cache to avoid faulty blocks during regular operation
Faulty
Resizing the Cache
Force the column MUX to select a non-faulty blockin the same row if the accessed block is faulty
Controller alters thecolumn address
Config Storage is accessedin parallel with cache
Feeds the fault informationto controller
“00”
Handle large number of faults without significantly reducing the cache size
Mapping Issue
More than one INDEXare mapped to same
block
Include column address bits into
TAG bits
Resizing is transparent to processor → same memory address
Config Storage
One bit fault information per cache blockBits are determined by BIST at the time of testingAccessed using row address part of INDEXProvides the fault information of all the blocks in a cache row to controller
Cache32KByte
ConfigStorage
1Kbit
Blocks per row 4 x
Block Size 32Byte 4bit
4 bit fault information about 4 blocks stored in a single cache row
Accessed in parallel with cache
ControllerColumn address selection based on fault location
Accessed Column Address00 01 10 11
Forced Column Address↓ ↓ ↓ ↓
None 0000 00 01 10 113rd Block 0010 00 01 00 112nd &3rd Block 0110 00 00 11 111st, 2nd & 3rdBlock 1110 11 11 11 11All four Blocks 1111 NA NA NA NA
Faulty Blocks in Accessed Row
Fault Information by Config Storage
Based on 4 bits read from Config Storage controller alters the column address
ControllerColumn address selection based on fault location
Accessed Column Address00 01 10 11
Forced Column Address↓ ↓ ↓ ↓
None 0000 00 01 10 113rd Block 0010 00 01 00 112nd &3rd Block 0110 00 00 11 111st, 2nd & 3rdBlock 1110 11 11 11 11All four Blocks 1111 NA NA NA NA
Faulty Blocks in Accessed Row
Fault Information by Config Storage
One block in a row is faulty
Selects the first available non-faulty block e.g 3rd block → 1st block
ControllerColumn address selection based on fault location
Accessed Column Address00 01 10 11
Forced Column Address↓ ↓ ↓ ↓
None 0000 00 01 10 113rd Block 0010 00 01 00 112nd &3rd Block 0110 00 00 11 111st, 2nd & 3rdBlock 1110 11 11 11 11All four Blocks 1111 NA NA NA NA
Faulty Blocks in Accessed Row
Fault Information by Config Storage
Two blocks in a row is faulty
Selects two non-faulty blocks respectivelye.g 2nd block → 1st block
3rd block → 4th block
ControllerColumn address selection based on fault location
Accessed Column Address00 01 10 11
Forced Column Address↓ ↓ ↓ ↓
None 0000 00 01 10 113rd Block 0010 00 01 00 112nd &3rd Block 0110 00 00 11 111st, 2nd & 3rdBlock 1110 11 11 11 11All four Blocks 1111 NA NA NA NA
Faulty Blocks in Accessed Row
Fault Information by Config Storage
Three blocks in a row is faultyAll the blocks are mapped to non-faulty block, e.g 4th block
One non-faulty block in each row, this architecture can correct any number of faults
Energy, Performance, and Area Overhead of Config Storage and Controller
BPTM 45nm technology, 32KByte Cache, 1Kbit Config Storage
Energy and Performance
32KBCache
Config Storage &Controller
Delay (ns) 0.45 0.22
Area overhead NA 0.5%
Energy overhead NA 1.8%
Controller changes the column address before data reaches at column MUXDoes not affect the cache access timeNegligible energy and area overhead (excluding BIST)
Results: Pop (ECC, Redundancy and Proposed Scheme)
Pop: Probability that a chip with NFaulty-cells can bemade operational
Faults are randomly distributed across chipYield is defined as:
Each scheme add some extra storage spacePop includes the probability of having faults in these blocksTo consider area, yield is redefined as:
)(*)(1__
_
CellFaultychipCellFaultyN
opTot
chip NNNPN
YCellFaulty
∑=
schemetolerantfaultwithchip ____
schemeanywithoutchipchip
effchip A
AYY ___=
0.0
0.2
0.4
0.6
0.8
1.0
0 52 105 157 210 262 315 367 419 472 524
Pop
Redundency R = 32ECC 1bitProposed Architecture
NFaulty-Cells
105 Faulty cells
Prop. 65%
ECC6%
% of the chips with 105 faulty cells which can be saved by
Proposed scheme ~ 65% (high fault tolerant capability)ECC ~ 6% Redundancy ~ 0%
Results: Pop
Results: Pop
0.0
0.2
0.4
0.6
0.8
1.0
0 52 105 157 210 262 315 367 419 472 524
P op
r=0
r=1
r=2
r=3
r=4
Adding redundant rows (r) in config storage
NFaulty-Cells
Pop improves with redundant rows in config storage
r = 2 is optimum for 32K cache with 1Kbit configstorage
Results: Pop
0.0
0.2
0.4
0.6
0.8
1.0
0 52 105 157 210 262 315 367 419 472 524
P op
R=0 r=2R=8 r=2R=16 r=2
Proposed architecture
with redundancy
NFaulty-Cells
Adding redundant rows (R) in cache in proposed scheme improves the Pop further
(optimum is R =8 for 32K cache)
Effective Yield of 32K Cache
898989
77
87
0
20
40
60
80
100
0 1 2 3 4Redundent Rows in Config
Storage (r)
% Y
ield
Proposed Architecture
54
8989919389
77 77 77 77
4938
333333
0
20
40
60
80
100
0 8 16 24 32
Redundent Rows in Cache (R)%
Yie
ld
Proposed Architecture with r = 2ECCRedundency
ECC + Redundancy yield ~ 77%Proposed architecture + Redundancy yield ~ 93%
(with 2 blocks in a cache row yield ~ 80%)
Conv. Yield
Proposed Arch. Yield without
any Redundancy
Optimum r = 2
Fault Tolerant Capability
0
50
100
150
200
250
300
350
0 52 105 157 210 262 315 367 419 472 524
Chi
p C
ount
(Nch
ip) Fault statistics
Chips saved by the proposed + redundancy (R=8, r=2)
Chips saved by ECC + redundancy ( R=8)
NFaulty-Cells
More number of saved chipsas compare to ECC ECC fails to save
any chips
Proposed architecture can handle more number offaulty cells than ECC, as high as 419 faulty cellsSaves more number of chips than ECC for a given NFaulty-Cells
0
50
100
150
200
250
300
350
0 52 105
157
210
262
315
367
419
472
524
577
629
682
734
786
839
890
944
996
1049
Chi
p C
ount
(Nch
ip)
Fault statistics
NFaulty-Cells
Conv. Yield≈ 33.4%
Process Tolerance: Fault Statistics in 64K Cache
σVt ≈ 30mv, using BPTM 45nm technology
NFaulty-Cells = PFault X NCells (total number of cells in a cache)Conventional 64K cache results in only 33.4% yield
Need a process/fault-tolerant mechanisms to improve the yield in memory
Process-Tolerant Cache Architecture
BIST detects the faulty blocksConfig Storage stores the fault information
Resize the cache to avoid faulty blocks during regular operation
Fault Tolerant Capability
0
50
100
150
200
250
300
350
0 105 210 315 419 524 629 734 839 944 1049
Chi
p C
ount
(Nch
ip)
Fault statistics
Chips saved by the proposed + redundancy (R=8, r=3)
Chips saved by ECC + redundancy ( R=16)
NFaulty-Cells
More number of saved chipsas compare to ECC
ECC fails to save any chips
Proposed architecture can handle more number of faultycells than ECC, as high as 890 faulty cells with marginal perf loss
CPU Performance Loss
0.0
0.5
1.0
1.5
2.0
2.5
0 105 210 315 419 524 629 734 839
% C
PU P
erfo
rman
ce L
oss For a 64K cache
averaged over SPEC 2000 benchmarks
NFaulty-Cells
Increase in miss rate due to downsizing of cache
Average CPU performance loss over all SPEC 2000benchmarks for a cache with 890 faulty cells is ~ 2%
Register File: Self-Calibration using Leakage Sensing
C. Kim, R. Krishnamurthy, & K. Roy
Process Compensating Dynamic Circuit Technology
clk
. . .RS0 RS7
D0 D7
RS1
D1
LBL0
LBL1
N0
Keeper upsizing degrades average performance
Conventional Static Keeper
Process Compensating Dynamic Circuit Technology
3-bit programmable keeper
clk
. . .RS0 RS7
D0 D7
RS1
D1
LBL0
LBL1
N0
b[2:0]
W 2W 4Ws s s
Opportunistic speedup via keeper downsizing
C. Kim et al. , VLSI Circuits Symp. ‘03
5X reduction in robustness failing dies
0
50
100
150
200
250
0.7 0.8 0.9 1.0 1.1 1.2Normalized DC robustness
Num
ber o
f die
s
ConventionalThis work
Noise floor
saved dies
Robustness Squeeze
10% opportunistic speedup
0
50
100
150
200
250
300
0.8 0.9 1.0 1.1 1.2Normalized delay
Num
ber o
f die
s
ConventionalThis work
PCD μ = 0.90Conv. μ = 1.00
μ : avg. delay
Delay Squeeze
Process detection
Self-Contained Process CompensationFab
Assembly
Wafer test
Burn inPackage testCustomer
Leakage measurement
On-die leakage sensor
Program PCD using fuses
On-Die Leakage Sensor For Measuring Process Variation
C. Kim et al. , VLSI Circuits Symp. ‘04
83μm
73μ
m
current reference
comparators
current m
irrors
VBIASgen.
NMOS device
test interface
High leakage sensing gain Compact analog design sharing bias generators
68
Leakage Current Sensing Circuits
Susceptible to P/N skew and supply fluctuationLarge area due to multiple analog bias circuitsLimited leakage sensing gain
+- d0
IREFd0VBIAS
+-
VSEN
VDD/2
VSEN
T. Kuroda et al., JSSC, Nov. 1996 M. Griffin et al., JSSC, Nov. 1998
69
Single Channel Leakage Sensing Circuit
Basic principle: Drain induced barrier loweringLow sensitivity to P/N skew and supply fluctuation
IREF
M1 (saturation)
M2 (subthreshold)
+-
VBIAS+- d0
VREF
VSEN
70
PV Insensitive Current Reference (IREF)
• Sub-1V process, voltage compensated MOS current generation concept
• Reference voltage, external resistor not required• Scalable, low cost, flexible solution
2/0.4
2/0.4
6/0.4
6/0.4
1/1.6
18/0.4
18/0.4
18/0.4
6/0.4 2/0.4
6/0.8 6/0.8 4/0.8 4/0.8
IREF
Vt generation circuit Subtraction circuit
S. Narendra et al., VLSI Circuits Symp. 2001
71
PV Insensitive Bias Voltage (VBIAS)
8/0.4 1/0.4
96/0.2
2/0.2IREF
VBIAS )(2
1
WWlog=
qkT
(W1=96μm, W2=2μm)
E. Vittoz et al., JSSC, June 1979
• PTAT containing no resistive dividers• Based on weak inversion MOS characteristics• Desired output voltage achieved via sizing
72
Comparator
• 2-stage differential amplifier• Already designed IREF is used for bias current
4/0.4 4/0.4
8/0.48/0.4 8/0.4
4/0.44/0.4
+- =
(+)(-)
output
IREF
Subtraction circuit
73
PV Sensitivity of Designed IREF, VBIAS
• IREF variation < 4%, VBIAS variation < 2%• Under realistic process skews, ±100mV supply
voltage fluctuations
1.2V, 90nm CMOS, 80˚C
VDD (V)
1.00 0.991.000.99 1.00 1.00
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.1 1.2 1.3
IREFVBIAS
1.011.000.99 1.000.99 1.01
0.0
0.2
0.4
0.6
0.8
1.0
1.2
fast typical slow
Process skew
IREFVBIAS
74
Proposed Leakage Current Sensing
IREF
M1 (saturation)
M2 (sub-threshold)
+-
VBIAS+- d0
VREF
VSEN
PMOS M1
0 0.2 0.4 0.6 0.8 1 1.2
VSEN (V)
I ds
fasttypicalslow
0 0.2 0.4 0.6 0.8 1 1.2
VSEN (V)
I ds
fasttypicalslow
NMOS M2
1.2V, 90nm CMOS, 80˚C
75
Superimposed I-V Curves
• 1.9-10.2X higher VSEN swing than prior-art• Process-voltage insensitive design
0 0.2 0.4 0.6 0.8 1 1.2
VSEN (V)
I ds
slowtypicalfast
1.2V, 90nm CMOS, 80˚C
∆VSEN=0.93V
76
6-Channel Leakage Sensor Test Chip
IREF+-
VBIAS
-+
WP 2WP 3WP 4WP 6WP 9WP
WN
VSEN
1
VREF
Bubblerejection
circuit
V1 V2 V3 V4 V5 V6
V1V2V3
V1V4V5
V2V4V6
OUT[2]
OUT[1]
OUT[0]
WN WN WN WN WN
VSEN
2
VSEN
3
VSEN
4
VSEN
5
VSEN
6
Incremental mirroring ratio for multi-bit resolution leakage sensingShared bias generators compact designProcess-voltage insensitive IREF, VBIAS gen.
-+ -+ -+ -+ -+
77
Multi-Bit Resolution Leakage Sensing
Leakage level determined by comparing VSEN1 through VSEN6 with VREF6-channel leakage sensor gives 7 level resolution
0
0.2
0.4
0.6
0.8
1
1.2
fast typical slow
Volta
ge (V
) VSEN6VSEN5VSEN4VSEN3VSEN2VSEN1VREF
Process skew
1.2V, 90nm CMOS, 80˚C
78
Example: Operation at Fast Process Corner
IREF+-
VBIAS
WP 2WP 3WP 4WP 6WP 9WP
WN
VSEN
1
VREF
Bubblerejection
circuit
V1 V2 V3 V4 V5 V6
V1V2V3V1V4V5V2V4V6
OUT[2]
OUT[1]
OUT[0]
WN WN WN WN WN
VSEN
2
VSEN
3
VSEN
4
VSEN
5
VSEN
6
1 1 1 1 0 0
1 1 1 1 0 1
00.20.40.60.8
11.2
fast typical slow
Volta
ge (V
)
VREF
1
0
1
-+ -+ -+ -+ -+ -+
Fast corner: output code ‘101’
79
Example: Operation at Typical Process Corner
IREF+-
VBIAS
WP 2WP 3WP 4WP 6WP 9WP
WN
VSEN
1
VREF
Bubblerejection
circuit
V1 V2 V3 V4 V5 V6
V1V2V3V1V4V5V2V4V6
OUT[2]
OUT[1]
OUT[0]
WN WN WN WN WN
VSEN
2
VSEN
3
VSEN
4
VSEN
5
VSEN
6
1 0 0 0 0 0
1 0 1 1 1 1
00.20.40.60.8
11.2
fast typical slow
Volta
ge (V
)
VREF
0
1
0
-+ -+ -+ -+ -+ -+
Typical corner: output code ‘010’
current reference
comparators
current m
irrors
VBIASgen.
NMOS devices
test interface
Technology 90nm dual Vt CMOSVDD 1.2V
Resolution 7 levelsPower consumption 0.66 mW @80Cº
Dimensions 83 X 73 μm2
On-Die Leakage Sensor Test Chip
Output codes from leakage sensor
001 010 011 100 101 110 111
Leakage Binning Results
ConclusionStatistical Failure Analysis Helps Enhance Yield
Post Silicon Tuning/Calibration is Becoming Promising for Si Nano systems
Built-In Leakage/Delay Sensors Provide Information on Intra-Die Process Variations