fault tolerant systemseuler.ecs.umass.edu/ece655/pdf/part20-ch8-vlsi2.pdf · 2016-10-17 · memory...
TRANSCRIPT
Page 1
Part.20.1 Copyright 2007 Koren & Krishna, Morgan-Kaufman
FAULT TOLERANT SYSTEMS
http://www.ecs.umass.edu/ece/koren/FaultTolerantSystems
Part 20 – VLSI 2Chapter 8 –Defect Tolerance in VLSI Circuits
Part.20.2 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Opportunities for Yield Enhancement
♦ The yield of a chip can be enhanced through•Architecture choice (redundancy – including spare components in the design)
» the chip can still be operational in the presence of some faults
•Decreasing the critical area, and consequently λat the design stage during
» compaction» routing» placement» floorplanning
•Decreasing the defect density♦ We concentrate on the first two options
Page 2
Part.20.3 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Yield Enhancement through Redundancy
♦ In many ICs - identical blocks of circuits (also called modules) are replicated •Example: Memory chips
♦ If the whole chip is expected to be fault-free -yield will be very low
♦ Adding spare modules increases the yield •Example: Large memory chips - either spare rows, or spare columns, or both are added
♦ Adding spare modules also increases the chip area ♦ This results in less chips out of the wafer♦ Even with a higher yield, we may end up with fewer
operational chips per wafer
Part.20.4 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Effective Yield♦ Area of chip increases as spares are added
è less chips on wafer♦ Yield by itself may not be the right measure for
circuits with redundancy♦ Effective Yield takes into account the increase
in chip area - measures the real benefits of redundancy
♦ Number of spares is selected so that the effective yield is maximized
Page 3
Part.20.5 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Effective Yield - Example♦ Yield Vs. Effective Yield for a circuit with
N=10 modules and R spares♦ Negative Binomial distribution used♦ λ=0.1 ; α =1 ; R=0,1,2,...,10
♦ Optimal number of spares: R=2♦ How were these yields calculated?
Part.20.6 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Basic Model of Redundancy -Replicated Circuits
♦ M modules needed for proper operation of chip♦ R spare modules are added ♦ All N=M+R modules are identical♦ At least M out of N modules must be fault-free ♦ Average number of faults per module is denoted
by λm
♦If Poisson distribution is used: λm= λ/N
iNNN
Mi
iNchip ee
i
NY −−
=
− −
= ∑ )1()( λλ
Page 4
Part.20.7 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Redundancy – Negative Binomial Model♦ An equivalent expression when using the Poisson
distribution:
♦ Compounding this expression with the Gamma distribution as compounder -
♦ Example: The yield of a chip with N=10 modules; λ selected so that the yield with no redundancy is 0.1
∑∑−
=
+−
=
−
−=
iN
k
NkikN
Michip e
kiN
iN
Y0
)()1( λ
∑∑−
=
−
=
+
+
−
−=
iN
k
kN
Michip N
kik
iNiN
Y0
)(1)1(
α
αλ
Part.20.8 Copyright 2007 Koren & Krishna, Morgan-Kaufman
More Complex Designs
♦ Two (or more) different types of modules♦ Support circuitry with no redundancy♦ Average number of faults per module: λmi
♦ For support circuitry (ck – chip kill): λck
α
α
λλλ −
+++++
−
−
ckmm kiki
kiN
iN
kiN
iN
21)()(
12211
2
22
2
2
1
11
1
1
∑∑∑∑−
=
−
===−−=
22
2
21
11
1
2
22
1
11 00)1()1(
iN
k
kkiN
k
N
Mi
N
MichipY
Page 5
Part.20.9 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Memory Arrays with Redundancy
♦ Memory arrays – highly regular♦ Simplifies incorporating redundancy into their design♦ Defect-tolerance techniques successfully applied to
memories since late 1970's♦ Simplest technique - spare rows and columns (word
lines and bit lines) ♦ Another technique – using
error-correcting codes ♦ Yield increases 30-fold
in early prototypes ♦ 1.5-to-3-fold increases
in yield of mature processes
Part.20.10 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Defect Tolerant Memories♦ One of the earliest designs: IBM’s 16K bit chip♦ Six redundant bit lines, four redundant word lines♦ Added area of 7%♦ Word and bit lines failures + Individual cell failures ♦ Decoders are “programmed” using fusible links or
Laser ♦ A row containing one or more
defective memory cells is disconnected by blowing a fusible link
♦ Disconnected row replaced by spare row with a programmable decoder (fusible links) - can replace any defective row
Page 6
Part.20.11 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Which Rows/Columns to Replace?
♦ 1. Identify faulty cells through testing, e.g., Built-In Self Testing (BIST)
♦ 2. Determine which rows/columns to replace•More complex – single faulty cell can be replaced by either – Example: 6×6 array, 2+2 spares:
♦ Use Row First assignment:♦ Use all available rows first♦ Replace row R0 and R1♦ 4 defective cells left♦ Array can not be repaired♦ Can another algorithm do better?
Part.20.12 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Bipartite Graph
♦ Two sets of vertices –corresponding to rows and columns
♦ An edge connect Ri to Cj if the cell at intersection is defective
♦ Select the minimum number of vertices to cover all edges• For each edge at least one incident
vertex must be selected♦ Example:
• Select C2 and R5• Select one out of C0 and R3• Select one out of C4 and R0
♦ Bipartite graph edge covering is NP-complete
Page 7
Part.20.13 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Heuristics♦ Should restrict to spare rows (columns) only?
• Two defects in same column (row)• A complete column (row) can be defective
♦ Two-step algorithm:• 1. Replace Must-Repair rows (and columns)
» Must-Replace Row: # of defects > # available spare columns
» After this, other rows (columns) may become must-repair• 2. Simple heuristic like Row-First to deal with remaining few
defects
♦ Example:• C2 must-repair column• Then – R5 becomes must-repair row• Finally, replace R0 and C0
Part.20.14 Copyright 2007 Koren & Krishna, Morgan-Kaufman
IBM's 16Mb DRAM with ECC♦ Spare rows/columns and error-correcting code (ECC) ♦ 4 independent quadrants with 16 redundant bit lines and 24
redundant word lines per quadrant♦ For every 137 bits, 9 check bits allow correction of any single
bit error ♦ Every 8 adjacent bits assigned to 8 separate words – lower
probability of 2 or more faults in same word ♦ Write includes: (1) Fetch, (2) Write back♦ Benefit of combined
strategy larger than sum of expected benefits:•ECC effective against individual cell failures
•redundancy effective against failures in same row/column
Page 8
Part.20.15 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Reliability vs Yield
♦ ECC commonly used in memory to protect against transient faults during operation - increase reliability
♦ Reliability improvement due to ECC only slightly affected by the use to correct defective cells
♦ Still, redundant rows and columns most commonly used
♦ Incorporated also in large cache units
♦ Benefits of redundant rows/columns is especially significant in early stages of production when yield is low•earlier introduction of new products into the market
Part.20.16 Copyright 2007 Koren & Krishna, Morgan-Kaufman
New Defect-Tolerant Memories
♦ Memory ICs have become very large ♦ Conventional redundancy of rows and columns - not
sufficient ♦ Partitioning into sub-arrays is a must
•Decrease the current•Shorten bit and word lines to reduce access time
♦ Disadvantages of conventional techniques•Inefficient use of redundant lines•Unable to deal with chip-kill defects
♦ New defect tolerance techniques are necessary
Page 9
Part.20.17 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Memory with Redundant Blocks
♦ 1 Gb DRAM constructed out of 4 256 Mb subarrays• Each subarray can become a part up to 4 different ICs• 16 sabarrays (marked) would ordinarily not be fabricated• Resulting a 2% increase in area – only column redundancy
• No improvement in yield if Poisson model followed, considerable improvement if negative binomial model used
x
Part.20.18 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Memory with Redundant Blocks
♦ 1 Gb DRAM is partitioned into 8 128 Mb mats•512 basic arrays of size 256Kbit (32 x 16matrix)
•32 spare rows and 32 spare columns•4 spare rows are allocated to a 16Mbit portion of the mat
•8 spare columns are allocated to a 32 Mbitportion of the mat
♦ 8 redundant blocks of size 1Mbit each•4 basic 256Kbit arrays •8 spare rows + 4 spare columns
Page 10
Part.20.19 Copyright 2007 Koren & Krishna, Morgan-Kaufman
128Mb mat (32 x 16 256Kbit arrays)
♦
Part.20.20 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Block Diagram
8 mats (128Mbit each)+8 redundant blocks (1Mbit each)
♦ A redundant block including 4 256Kbit arrays, 8 redundant rows and 4 redundant columns
Page 11
Part.20.21 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Yield Comparison (for half chip)
♦
λ (1/cm )2
Part.20.22 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Defect-Tolerant Microprocessor
♦ ESPRIT Project♦ 16-bit processor core♦ Data Path - 1 spare
bit-slice ♦ Control Path - PLAs
with spare product terms
♦ Area overhead < 25%
Page 12
Part.20.23 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Data path and control redundancy
♦ Effective Yield
Part.20.24 Copyright 2007 Koren & Krishna, Morgan-Kaufman
The 3D Computer (Hughes Labs)
♦ A massively parallel array (SIMD) for image processing ♦ 32 x 32 array (5-wafer stack)♦ 128 x 128 array (15-wafer stack) - 100 BOPS♦ Redundancy is a must
Page 13
Part.20.25 Copyright 2007 Koren & Krishna, Morgan-Kaufman
The 3D Computer - Defect Tolerance
♦ “Conventional" Wafer Stack Wafer-scale Design
Part.20.26 Copyright 2007 Koren & Krishna, Morgan-Kaufman
The 3D Computer - Interstitial Redundancy
♦ (2,4) redundancy ♦ Local redundancy
uniformly distributed♦ Simple switches and
short interconnections
Page 14
Part.20.27 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Effect of Floorplanning on Yield
♦ VLSI designers rarely consider yield issues when selecting a floorplan for a newly designed chip
♦ This is justified for chips which are small relative to defect clusters
♦ Floorplanning can affect yield under the following conditions:• Area of chip is very large• Defects are clustered• Defect clusters are medium-sized compared to chip• Chip has modules with different sensitivities to defects
♦ or• Chip has incorporated redundancy
Part.20.28 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Effect of Floorplanning - Simple Chip
♦ Example:♦ A chip consists of nine equal-area functional units ♦ Fault densities♦ Fault clusters are medium-size (2x2 or 2x3)♦ Two floorplans:
Page 15
Part.20.29 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Floorplanning For High Yield -Recommendations
♦ Floorplan (b) has a higher yield♦ Module with highest fault density is placed in
center♦ Modules with lowest fault densities are placed
in corners♦ Distance from center - inversely related to
sensitivity to defects♦ Intuitive explanation - likelihood of one defect
cluster killing two or more chips is reduced♦ May have a negative impact on wiring length
Part.20.30 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Original and Alternate Floorplans of
Matsushita's Adenartµprocessor
♦
Page 16
Part.20.31 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Yield and Wiring Cost of 4 Floorplans
♦
Part.20.32 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Floorplanning - Redundancy♦ Example: A chip with 4 modules M1,M2,S1,S2
♦ S1 - a spare for M1; S2 - a spare for M2
♦ Floorplan (c) has the highest yield♦ Guarantees separation between primary modules and
their spares for any size and shape of defect clusters
♦ It is less likely that the same cluster will hit both the module and its spare, thus killing the chip
Page 17
Part.20.33 Copyright 2007 Koren & Krishna, Morgan-Kaufman
Example - Floorplan of the 3-D Computer♦ A (2,4) structure - every spare unit adjacent to
the four primary units that it can replace♦ Short interconnection links between a spare and a
primary♦ Advantage: Performance degradation upon a failure
is minimal♦ Disadvantage: Proximity of primary units and spare
results in a low yield in the presence of clustered faults
original floorplan
Part.20.34 Copyright 2007 Koren & Krishna, Morgan-Kaufman
An Alternate Floorplan♦ Spare is placed
farther apart from the primary units it can replace
♦ 128 x 128 array♦ Medium-area
Negative Binomial ♦ defect block size
of two rows (α=2)