lehrstuhl technische informatik - computer engineering brandenburgische technische universität...
TRANSCRIPT
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Architectures and Diagnosis Methods for Self Repairing Logic
H. T. Vierhaus
BTU Cottbus
Computer Engineering
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Outline1. Parameters for Self Repair Functions
2. Self Repair Based on FPGAs3. PLAs and CPLDs
4. Duplication and Switched Logic Blocks 5. Fault Diagnosis and Fault Administration6. Test and Fault Diagnosis7. Some Parameters in Comparison
8. Summary and Conclusions
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Basic Parameters for BISRFault densities that can be managed
Overhead (chip area, time, dissipated power)
Types of faults that can / cannot be repaired
Compatibility with standard CMOS processes
Applicability to BISR in a production - test environmentor in the field of application
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Repair Granularity and Fault Density
Granularity(transistors)
100 101 102 103 104 105 106
trans. gateFPGAblock
Makro-Ersatz(CPU etc.)
Hardly explored(logic)
Granularity(transistors)
100 101 102 103 104 105 106
trans. gateRT-macro cores
CPU
Block- Ersatz(ALU etc.)
Expected fault density (1 out of..)
Logic / GateLevel Ersatz
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Repair Overhead versus Element Loss
Size of replaced blocks(granularity)
Repair procedureoverhead
Functioningelements lost
1 10 100 1k 10k 100k 1M 10M
Prohibitiveoverhead
Prohibitivefault density
NewMethodsandArchi-techtures
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Block Structure of FPGAs
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB backup-row
functionally usedCLBs
row withfaulty CLB
usedCLBs
Programmableinterconnects
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
FPGA ExperiencesFPGA repair schemes that discard a whole row or columnof CLBs are simple to implement but inefficient, as theylose many functional CLBs for a single fault.
FPGA schemes that reserve single CLBs in the matrixfor backup and do repair by single CLB replacement aremuch more difficult to implement because of the necessaryirregular-wiring process.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
FPGA with Irregular Repair Scheme
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
CLB CLB CLB CLB
backup block (reserved)
functionally usedCLBs
row withfaulty CLB
usedCLBs
Programmableinterconnects
Backupblockused forreplacement
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
BISR by Standard FPGAs ?Configurable logic blocks (CLBs) are rather large(5000-10 000 transistors, estimated)
FPGAs are heterogeneous by nature:- memory-like lookup tables- logic elements (selectors, decoders, flip-flops, embedded arithmetic units)- local and global programmable interconnects with additional elements for programmability- embedded CPUs.
For fault densities below about 1 in 10 000, repair mustgo into CLBs or slices !
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Structure of a CLB Slice
LogicField
Logicin
Program in
Logicout
Redudant Row
MUX FF
FFin SRAM
MUX
FF
out
out
SRAM
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Look-up-Table
in
De-coder
Programming
outselect
sele
ct
backup cellfaulty cell
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Self Repair within FPGA Basic Blocks
Heterogeneous repair strategies required (memory, logic)
Logic blocks may use methods known from memory BISR
Additional repair strategies are necesssary for logic elements
The basic overhead for FPGAs versus standard logic(about 10) is enhanced.Repair strategies for logic may use some features alreadyused in FPGAs (e. g. switched interconnects).
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Flip-Flop Backup Scheme
LUT 1
LUT n
MUX
FF1
FFk
FFbackup
OutSel
ID
ID
S
S
Lookup-Tables
Logicin
Stor.
in
Logicin
Selector Block with spare FF
out
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
PLA- like Structures
CZ1 Z2 Z3 Z4
Outputs
OR-Array
VDD
A BInputs
AND-Array
metal
poly-Si
n-diff.
contact
back-up elements
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
PLA Repair Scheme
CZ1 Z2 Z3 Z4
Outputs
OR-Array
VDD
A BInputs
AND-Array
metal
poly-Si
n-diff.
contact
back-up elements
Switching Unit
Sw
itch
ing
unit
Specific programmingof cross points !
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
FPGA / CPLD RepairLooks pretty easy at first glance because of regulararchitecture
Requires lines / colums of switches for configuration atinputs and between AND / OR matrices
Requires additional programmability of cross-points by double-gate transistor as in EEPROMs or Flash memory
Not fully compatible with standard CMOS
Limited number of (re-) configurations
Floating gate (FAMOS) transistors are fault-sensitive !
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Double-Gate Transistors
p-Substrat
Isoliertes Gate
Steuer-GateTunnel-Oxid
Auswahl-Gate
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Cell DuplicationVDD
VDD-SwitchSwitchcontrol
VDD1
GND
Out 1
in1
in2
Gate 1
Out 2
VDD2
GND
in1
in2
Gate 2
Outputcontrol
Out
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Cell DuplicatonSimple scheme involving VDD off / on switch
Inherent duplication of efforts
VDD separation of fault cells
Extra effort for output isolation of fault cells necessary.
Input isolation (input gate shorts) is not easily possible.
Relatively large overhead for managing repair states andredundance (re-) organisation.
Fully CMOS compatible
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Block Organization in Random Logic
Cell Cell
Cell Cell
BackupCell
Insw
Out
Sw
Logic Logic
Logic Logic
in
in
out
out
out
out
in
in
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Logic Cluster ArchitectureA number of equal-type logic gates makes a cluster
The cluster contains one or more spare gates
A spare gate may replace a normal device, modificationis done via sets of input / output selctors / de-selectors
Problems:
Input gate short of a „normal“ device is not fully isolated
For n gates alternatively mapped to a single backup device,there are (n+1) control states.
Switching elements are complex and not fault tolerantBy themselves.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Modified Cluster Architecture
Cell
Logic
Cell
Logic
(backup)Logic Cell
Cell
Logic
Cell
Logic
InputSwitch
OutSwitch
inout
select control
Can possibly isolate aspecific gate, but stillrequires lots ofadministrative overhead.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Reconfiguration by Permutation Schemesswitch switch
FU
FUB
FUB
FU
FU
FU
FU
FU
inputs outputs
functional
unit
backupunit
2-Way Switch
state 0
state 1grounded„faulty“inputs / outputs
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Specific FeaturesOnly 4 logic states for permutation in a cluster of 8 logic blocksincluding 2 for backup.
All single failed blocks plus some double failures can becompensated.
Failed components are isolated and input / output grounded.
Input gate shorts can be handled.
Internal blocks may have different complexities dependingon anticipated fault density.
Simple switching devices, fully CMOS compatible.
Fault tolerant switching devices need extra effort !
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Fault Tolerant Switch
s s
s sin out
Switching elements can be made fault-tolerant bythemselves, both for on- and off-type faults !
... but at the cost of extra delays !
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Test, Diagnosis, Fault Administration
For self rapair in the field of application, fault diagnosismust identify faulty elements that can be replaced.
The granularity of fault diagnosis is therefore dependingon the granularity of replacement (gates, RT-elements, CPUs)
Conventional fault diagnosis in scan-based test is limitedto the respective position in a scan chain.
As scan chains are often allocated in a random manner withouta strict reference to RT-level architectures, diagnosis methods usedwith production test are not a real solution.
A system that has redundant elements and self-repair functions mustrestore a „working“ status after power-off periods by:- storage and re-assembly of the previous status of repair, or by- self test, fault diagnosis and re-configuration after start-up.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Test and Redundancy Administration
SystemWithBISR
Capability
Redundant Elements
CPU
Repair StatusMemory
Statuscontrol
... makes a significant overhead beyond redundancy provision !
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Test and Diagnostic Resolution (1)
Scan -in
Scan -out
G1 G2 G3 G4 G5
G7 G8 G9 G10
G11 G12 G13
G6
Scan test can only identifyfaulty scan-out location !
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Test and Diagnostic Resolution (2)
Scan -in
Scan -out
G1 G2 G3 G4 G5
G7 G8 G9 G10
G11 G12 G13
G6
Scan test can only identifyfaulty scan-out location !
Non-resolvablefault !
Further resolution by multipletest patterns !
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Production Test with DiagnosisTest
Fault Detection
DiagnosisScan-Path Nr.,
Bit-Nr.
Fault simulationLayout
Chip-Analysis
On-line
Off-line
.. is not available in the field !
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Diagnosis by „Tentative“ Repair
Test Process
Fault Detection?
Diagnosis
Single-Repair-Function
Diagnosis /Fault List
yes
New Test Vector
Start
Testpatt.-List
Repair.annotation
Test compl.?
no
Repair-Process
no
yes
tentativerepair
final repair
test &status monitoring
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Tentative RepairSwitch-off of faulty elements and power separation are often done by „fuses“.
Once a fuse is blown, it cannot be re-installed !!
Reconfiguration schemes based on „fuse“ or „antifuse“switching elements cannot be used in conjunction with„tentative repair“.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Enhanced Logic Cluster
scan
scan
scan
scan
swit
ch
FU
FUB
FUB
FU
FU
FU
FU
FU
inputsoutputs
scan
scansw
itch
Scan in Scan out
Extra scanouts at extra blocks
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Diagnostic TestIn a bundle of 8 blocks and with 2 extra outputs.
By going through the 4 logic states of (re-)configuration,each block is once connected to the „spare“ inputs andoutputs.
If a test pattern is applied to 4 units of the same type bygoing through the 4 states, the faulty unit can be identified.
The „false“ output detection can be used locally to set a statusof re-configuration.
With multiple units of the same type tested in parallel, timeand overhead are resonable.
If tests are short and reliable, an initial test process after every power-downcan be performed. Keeping configurations in a memory is not necessary.
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Local Test and Reconfiguration
scan
scan
scan
scan
swit
ch
FU
FUB
FUB
FU
FU
FU
FU
FU
inputsoutputs
scan
scansw
itch
Scan in Scan out
+
Ref. out
Switch control FSM
faultdetect
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
Integrated Test & Repair
logic R
BIST&Repair
logic R
BIST&Repair
logic R
BIST&Repair
logic R
BIST&Repair
logic R
BIST&Repair
logic R
BIST&Repair
GlobalControl
BIST start
Monitoring of„repair resources
exhausted“conditions
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
ComparisonFPGAs PLAs cell cell
duplication cluster
Repair bysingle method
Overhead forreplacement
Overhead fororganisation
CMOS-logiccompatible
Reconfigur.non-volatile
Gate-short-repair
10-20% 20-30% 100 % 10-20%
high medium 20-30% 30-50%
no no yes yes
no partial no no
yes no yes yes
yes * yes difficult yes
Diagnosis bytrial/ error
- + ++ +
VDD-separation
no no yes no
In-fieldrepair
+ + ++ + +
*
Lehrstuhl Technische Informatik - Computer Engineering
Brandenburgische Technische Universität Cottbus
SummarySeveral types of logic (FPGAs, CPLDs) require either aninhomogeneous replacement process based on different typesof redundant elements.Repair schems that need special devices (e. g. floating gate transistors) are not attractive.
Schemes that provide a high level of fault isolation forshort-type faults are most attractive.
Architectures that also provide excellent local (self-) testcoupled to locally organized self repair are possible.