fpga self-repair using an organic embedded system architecture
DESCRIPTION
FPGA Self-Repair using an Organic Embedded System Architecture. Kening Zhang, Jaafar Alghazo and Ronald F. DeMara University of Central Florida. 06 December 2007. Reliability Availability Sustainability. - PowerPoint PPT PresentationTRANSCRIPT
06 December 200706 December 2007
FPGA Self-Repair FPGA Self-Repair using anusing an
Organic Embedded System ArchitectureOrganic Embedded System ArchitectureKening Zhang, Jaafar Alghazo and Ronald F. Kening Zhang, Jaafar Alghazo and Ronald F.
DeMara DeMara University of Central FloridaUniversity of Central Florida
Kening Zhang, Jaafar Alghazo and Ronald F. Kening Zhang, Jaafar Alghazo and Ronald F. DeMara DeMara
University of Central FloridaUniversity of Central Florida
Reconfigurable Hardware with Self-Healingbased on SRAM FPGA platform
Organic Computing (OC)biologically-inspired computing with “self-x” properties
Communication networks among
autonomous systems
Self-x Characteristics
System Property
Composed of large collection of
autonomous systems
•Self-organization•Self-configuration•Self-optimization
Autonomous system owned sensor and
actuators
•Self-healing•Self-protection•Self-explaining
•Context-awareness•Self-synchronization
Technical Objective:
OC Approach: addresses system controllability with increasing complexity
Example Relevance:How to achieve sustainable presence in NASA’s Moon, Mars & Beyond objective???
Reliability Availability Sustainability
support long lifetime missions with multiple failure occurrences
Research Focus:
Sponsors: NASA: FPGA platform and Genetic Algorithm research DARPA: OC approach and SOAR Longevity Platform
Goal: Autonomous FPGA Refurbishment
Redundancy
increases with amount of spare capacity
restricted at design-time
based on time required to select spare resource
determined by adequacy of spares available (?)
yes
Refurbishment
weakly-related to number
recovery capacity
variable at recovery-time
based on time required to find suitable recovery
affected by multiple characteristics (+ or -)
yes
Overhead from Unutilized Spares weight, size, power
Granularity of Fault Coverage resolution where fault handled
Fault-Resolution Latency availability via downtime required to handle fault
Quality of Repair likelihood and completeness
Autonomous Operation fix without outside intervention
increase availability without carrying pre-configured spares …
Device Failure
Duration:
Target:
Detection:
Isolation:
Diagnosis:
Recovery:
Transient: SEU Permanent: SEL, Oxide Breakdown, Electron Migration, LPD
Scrubbing
DeviceConfiguration
Approach: TMRBIST
Processing Datapath
DeviceConfiguration
Processing Datapath
Evolutionary
Bitwise Comparison
Reload Bitstream/ Invert Bit Value
IgnoreDiscrepancy
MajorityVote
STARS
SupplementaryTestbench
CartesianIntersection
Worst-caseClock Period
Dilation
Replicate inSpare Resource
Characteristics
MethodsCED
Duplex Output
Comparison
Fast Run-time Location
Select SpareResource
Vigander
Duplex/TriplexOutput
Comparison
(not addressed)
(not addressed)
unnecessary Autonomous Supervisor (AS)
Autonomous Element (AE)
Population-basedGA using
Extrinsic FitnessEvaluation
EvolutionaryAlgorithm usingIntrinsic Fitness
Evaluation
Fault-Handling Techniques for SRAM-based FPGAs
OC
Autonomous System-on-a-Chip (ASoC) Architecture
Dual-layer ASoC proposed by Lipsa et al [Lipsa 05]• Functional Layer
• Functional Elements (FEs) e.g. CPU, RAM, Network interface• Autonomic Layer
• Autonomic Elements (AEs)• Monitor• Actuator• Communication interface
• Autonomic Supervisor (AS)
UCF Approach for fault coverageFunctional Layer & Autonomic Layer• achieved by assessing consensus
among elements 1. first to realize failure detection2. consensus provides an organic method for fitness evaluation of competing alternatives during evolution providing a self-regulating approach to fault resolution
EHW Environments
• Evolvable Hardware (EHW) Environments enable experimental methods to research soft computing intelligent search techniques
• EHW operates by repetitive reprogramming of real-world physical devices using an iterative refinement process:
Genetic
Algorithm
Hardware in the loop
orTwo
modes
of
Evolvabl
e
Hardwar
e
Extrinsic Evolution
Genetic
Algorithm
software modelDone? Build it
device “design-time”refinement
Simulation in the loop
Intrinsic Evolution
device “run-time”refinement
new approach to
Autonomous Repair
of failed devices
Deep Space Satellite: • >100 FPGAs onboard• hostile environment: radiation, thermal stress• How to achieve reliability to avoid mission failure???
Application
Genetic Algorithms (GAs)
Mechanism coarsely modeled after neo-Darwinism (natural selection + genetics)
selection of
parents
population of candidate solutions
parents
offspring
crossover
mutation
evaluatefitness
ofindividuals
replacement
start
Fitnessfunction
Goal reached
Genetic Mechanisms
• Guided trial-and-error search techniques using principles of Darwinian evolution iterative selection, “survival of the fittest” genetic operators -- mutation, crossover, … implementor must define fitness function
• GAs frequently use strings of 1s and 0s to represent candidate solutionsGenotype chromosomes of GA operation: if 100101 is better than 010001 it will have more chance to
breed and influence future population
Genotype changes during evolution must adhere to the Xilinx-defined format of bitstream
To prevent undesirable conditions that may damage the FPGA such as a mutation which has two logic outputs tied together, a logical genotype is used for evolution and mapped to physical phenotype
Logic # = functional logic index number for LUTRow/Column= physical location of LUT in FPGA
• Can invoke Elitism Operator (E=1, E=2 …) guarantees monotonically increasing fitness of best individual over all generations
Avnet FPGA Development Board
PCI I nt er f ace
Virtex-IIPro FPGA
Off ChipRAM
Controlhosted on
PC
FP
GA
Ou
tp
ut
Bit file
Input Data
Loosely Coupled Solution on Xilinx Virtex II Pro & Virtex 4
The entire system operates on a The entire system operates on a 32-bit basis32-bit basis
The The Virtex 2Pro/4Virtex 2Pro/4 is mounted on a is mounted on a development board which can then development board which can then
be interfaced with a WorkStation be interfaced with a WorkStation running running XilinxXilinx EDK and ISE. EDK and ISE.
Organic Embedded System (OES) Architecture
One Dimensional Column-oriented OES based on Xilinx Virtex II Pro FPGA platform
• FEs and AEs reside on two distinct layers with interconnection structure between them• AEs and FEs can either be realized in hardware, software, or co-design• AE layer supervises functionality of FE elements while requiring no application-specific algorithms on
the AE layer• Observer/Controller architecture includes an AS element which had no counterpart to evaluate if the AS
fault-free, so address by minimizing its complexity in proposed approach• utilize Xilinx partial reconfiguration technology to manipulate relocatable bitstreams
OES AE Component Design
AEs decentralize Observer/Controller functionality:• Concurrent Error Detection (CED) unit collects 2 FE Outputs for
discrepancy identification • A Checksum for AE fault detection which are checked against Stored
Checksum values • Evaluator of outputs from 2 FEs against checksum and Actuator which
initiates recovery phase• An important architectural property is that all AE components are
identical in structure despite the fact that they monitor different types of FEs.
• Homogeneous characteristics deliver a uniform-behavior property leveraged for consensus-based evaluation fault-handling methodology
• OC Concept: although AE components add an additional complexity to the design, they will ease integration of fault-handling difficulties inherent with current commercial IP cores
Consensus-Based Evaluation (CBE)
• Uses a Relative Fitness MeasureUses a Relative Fitness Measure Pairwise discrepancy checking yields relative fitness measurePairwise discrepancy checking yields relative fitness measure Broad temporal consensus in the population used to determine Broad temporal consensus in the population used to determine
fitness metricfitness metric Transition between Transition between Fitness States Fitness States occurs in the populationoccurs in the population Provides graceful degradation in presence of changing Provides graceful degradation in presence of changing
environments, applications and inputs, since this is a moving environments, applications and inputs, since this is a moving measuremeasure
• Test Inputs = Normal Inputs for Data ThroughputTest Inputs = Normal Inputs for Data Throughput CBE does not utilizes additional functional nor resource test CBE does not utilizes additional functional nor resource test
vectorsvectors Potential for higher availability as regeneration is integrated Potential for higher availability as regeneration is integrated
with normal operationwith normal operation
Genetic Operators: Mutation
Mutation: Genotype chromosomes
Mutation: Phenotype chromosomes
• original functionality isF = F1·(F3+ F4) w/ input F2 unassigned by synthesis tool
• mutation operator will change input F4 to unused as F = F1·(F3+ F2)
• shadow shows changed input and LUT contents
• some opportunity for input stuck-at fault or LUT content stuck-at fault.
• functionalities of LUTs remain undistorted while search space explored
Typical Approach: bit inversion of LUT functionality Selected Approach: input interconnection of LUTs mutated
Rearrange input interconnection to search unused LUT resources which occlude faulty resource
Genetic Operators: Cell Swapping
Cell-Swap operation on Genotype chromosomes
Cell-Swap operation on Phenotype chromosomes
interchanges two distinct LUT blocks while maintaining correct logic order and functionalities in genotype
• exchange all LUT input interconnections, LUT content and physical 2-tuple (Col#, Row#) as well as the logic sequence
Genetic Operators: PMX Operator
Partial Match Crossover (PMX) maintains crossover information as well as order information
• two genotype configuration streams are aligned at LUT boundary• crossover site selected at random along LUT boundary• this crossover point defines a left/right partition used to affect crossover through LUT-by-LUT exchange • suppose crossover point at position 4 of the LUT vector:
• first step is to map configuration B to configuration A by exchanging the following aligned LUTs {(4,7),(5,2),(6,1),(7,5)}. •Applying PMX results in two new configurations A’ and B’
Illustrative Example:Gate Level Design of OES
• Experiment circuit: 1-bit Full-adder
• Fault-free model: Duplex• Fault-impact model: TMR• Fault-detect model: CBE• Fault recovery strategy: GA
operation• Experimental setup:
Hardware prototype implemented in Xilinx Virtex-II Pro FPGA
VHDL implementation Using the GNAT library along with
the MRRA framework and JTAG reconfiguration interface.
MCNC-91 Benchmark Case Studies
System Availability under Multiple Faults
Circuit Name Circuit Function Inputs Outputs Approximate Gates
z4ml 2-bit Add 7 4 20
cm85a logic 11 3 38
cm138a Logic 6 8 17
Fc = number of correct behaviors of FE observed during evolutionary recovery phaseFe = number of errant or discrepant behaviors 1 = exactly one output required to detect the fault during the original CED
configuration. 2 = number of the reconfigurations required, i.e. one from CED to TMR, and one back
from TMR to CEDFc1 & Fe1 = correct and faulty output number of the FE during the AE repair periodFc2 & Fe2 = correct and faulty output number during the FE repair period n = number of reconfigurations of the FEβ represents reconfiguration to computation time ratio
Experimental Results
Redundancy for both FE (RFE) and AE (RAE) = ratio of unused LUT inputs to total number of LUTs inputs
Fc = number of correct behaviors of FE observed during evolutionary recovery phase
Fe = number of errant or discrepant behaviors
n = number of reconfigurations of the FE
β represents reconfiguration to computation time ratio
• Fault Free arrangement: CED FEs with cold standby FE
• Inject a stuck-at-zero or stuck-at-one fault at one of the FE’s LUT input pins
• CED -> TMR to identify faulty FE or AE
• CBE used to resolve faulty AE
Experimental Results
Redundancy for both FE (RFE) and AE (RAE) = ratio of unused LUT inputs to total number of LUTs inputs
Fc = number of correct behaviors of FE observed during evolutionary recovery phase
Fe = number of errant or discrepant behaviors
n = number of reconfigurations of the FE
β represents reconfiguration to computation time ratio
• Fault Free arrangement: CED FEs with cold standby FE
• Inject a stuck-at-zero or stuck-at-one fault at one of the FE’s LUT input pins
• CED -> TMR to identify faulty FE or AE
• CBE used to resolve faulty AE
Experimental Results
Redundancy for both FE (RFE) and AE (RAE) = ratio of unused LUT inputs to total number of LUTs inputs
Fc = number of correct behaviors of FE observed during evolutionary recovery phase
Fe = number of errant or discrepant behaviors
n = number of reconfigurations of the FE
β represents reconfiguration to computation time ratio
• Fault Free arrangement: CED FEs with cold standby FE
• Inject a stuck-at-zero or stuck-at-one fault at one of the FE’s LUT input pins
• CED -> TMR to identify faulty FE or AE
• CBE used to resolve faulty AE
Conclusion
• A self-adaptation and self-healing OES architecture developed for autonomic operation without human intervention.
• The OES architecture is capable of handling many single fault scenarios and several multiple fault scenarios for small digital logic design.
• Experimental result support our design objectives during the repair phase averaged 75.05%, 82.21%, and 65.21% for the z4ml, cm85a, and cm138a circuits respectively under stated conditions.
• Reconfiguration time ratio (β) ratio is key factor limiting availability during AE repair
• Future work: evaluate extensions of the OES architecture addressing scalability of in terms of pipelined stages
Backup Slides
• On following pages …
Isolation of a single faulty individual with 1-out-of-64 impact
• Outliers are identified after EW iterations have elapsed• Expected D.V. = (1/64)*600 = 9.375 from individual impacted by fault• Isolated faulty individual’s DV differs from the average DV by 33 after 1 or more observation intervals of
length EW
instantaneous DV (point
values) for a sample
individual in population
and
population oracles (solid
lines)
Sliding Window
Future Work:Development Board to Self-Contained FPGA
Year 1 Year 3Year 2
CRR on a Chip(Xilinx Virtex-II Pro)
Control viaon-chip
Power PC
Re-config
Config
Data
Configurationsin On ChipRAM Blocks
FunctionalCLBs
ICAP
Bit file
Data
Output
Request
Avnet FPGA Development Board
PCI Interface
Virtex-IIPro FPGA
Off ChipRAM
Controlhosted on
PCOutput
Bit file
Input Data
CRR on a Chip(Xilinx Virtex-II Pro)
Device Fault
Qualitative Analysis of CRR modelQualitative Analysis of CRR model• Number of iterations and completeness of regeneration repair • Percentage of time the device remains online despite physical resource
fault (availability)Hardware Resource ManagementHardware Resource Management
• Optimization of hardware profile for Xilinx Virtex II ProField Testing on SRAM-based FPGA in a Cubesat missionField Testing on SRAM-based FPGA in a Cubesat mission
OES Integrated FE and AE Failure
Detection Procedure
• System Initialization FE Initialization step Compute Checksum step
• FE Fault Detection/Recovery AE-CED fault detection FE fault-recovery
• AE fault detection Phase A fault may exist in the CED,
Actuator, or Evaluator, A fault may exist in Check Sum
component, or A fault may exist in the Stored
CheckSum-LUT.
Runtime inputs to FE applied to both active instance under a CED strategy. After allowing for FE inputs propagation time through the AE, the
expected output will be supplied to AE-CED for the fault detection. The output of the FE is then compared in the AE-CED module and any
discrepancy between the two values will indicate that a fault has occurred either of one the FE or the AE-CED itself. Further detection will be required to
distinguish which of the two is faulty.If the AE component is identified as innocent and
then the fault must of occurred in this output will be discarded and control will branch to a fault
identification phase which will wakeup the cold standby FE and construct a temporary TMR system
which can articulate the faulty FE under the new supplied external input. Furthermore, as descrived
in Section 3.3, the actuator will initiate a repair cycle which may require automatic evolutionary
repair of the identified faulty FE which will be set as standby-under-repair and the AE-CED will return to receive the remaining two active FEs’ inputs. The decision-making procedure causes at least one
throughput-delay penalty
Previous Work
Detection Characteristics of FPGA Fault-Handling SchemesDetection Characteristics of FPGA Fault-Handling Schemes
Fault Detection
Resource Coverage
Fault Isolation
Approach Fault Handling Method Latency Distinguish Transients
Logic Inter-
connect Comparator Granularity
TMR Spatial voting Negligible No Yes Yes No Voting element
[Vigander01] Spatial voting & offline
evolutionary regeneration Negligible No Yes No No Voting element
[Lohn, Larchev,
DeMara03]
Offline evolutionary regeneration
Negligible No Yes Yes No Unnecessary
[Lach98] Static-capability tile
reconfiguration Relies on independent fault detection mechanism
STARS [Abramovici01]
Online BIST Up to 8.5M
erroneous outputs Test pattern transients
Yes Yes No LUT function
[Keymeulen, Stoica,
Zebulum00]
Population-based fault insensitive design
Design-time prevention emphasis
No Yes Yes No Not addressed
at runtime
CRR Competing configurations with temporal voting and
online regeneration Negligible
Transients are attenuated
automatically Yes Yes Yes
Unnecessary, but can isolate functional
components
… Strategy #1) Evolve redundancy into design before the anticipated failure or …
Approach Online Recovery
Basis for Recovery
Quality of Recovery
Availability Externally-supplied Elements
Resource Recycling
Pre-determined
Limits
Power Consumption
TMR Yes Requires 2
datapaths are operational
Either
complete or none
100% for single fault,
0% thereafter 2 of 3 Majority Voter No Single
datapath
3n
[Vigander01] No Design complexity
Non-deterministic
Non-deterministic
GA Controller, function test vectors
Yes None 3n+r
[Lohn, Larchev,
DeMara03] No Design
complexity Non-
deterministic Non-
deterministic GA Controller,
function test vectors Yes None 2n+r
[Lach98] No Available spares
Either
complete or none
Either
complete or none
Device test vectors and controller
No Only one
faulty CLB per tile
2n+i+r
STARS
[Abramovici01]
Yes Available spares
Restricted by non-
optimizable re-routing
Only ~93% regardless of
fault occurrence
Test Reconfiguration Controller + device
test vectors Yes
Available spares within
routing chokepoints
s • (c+m+b)
[Keymeulen, Stoica,
Zebulum00] No
Depends on characteristics at design time
Non-deterministic
Non-deterministic
None at runtime No Depends on redundancy
during design n • (1 + f(g))
CRR Yes Recovery complexity
Optimized by second-order fitness metric
Adaptable
Optional RAM. RAM coverage is intrinsic.
No test vectors.
Yes None 2n+r
Fault Recovery Characteristics of Selected ApproachesFault Recovery Characteristics of Selected Approaches
Previous Work
… Strategy #2) Evolve recovery from specific failure after (and if) it occurs or …
CRR Arrangement in SRAM FPGA
Configurations in PopulationConfigurations in Population• C = CL CR
• CL = subset of left-half configurations• CR = subset of right-half configurations• |CL|=|CR |= |C|/2
Discrepancy OperatorDiscrepancy Operator• Baseline Discrepancy Operator is dyadic operator with binary output:
• Z(Ci) is FPGA data throughput output of configuration Ci
• Each half-configuration evaluates using embedded checker (XNOR gate) within each individual
• Any fault in checker lowers that individual’s fitness so that individual is no longer preferred and eventually undergoes repair
Othewise
CZCZCC
Ri
LiR
iLi
)()(
1
0
Reconfiguration Algorithm
`
SR A M-based FPGA
LHalf-Configuration
Discrepancy Check L Discrepancy Check R
Function Logic L
CONFIGURATION BIT STREAM
INPUT DATA
Function Logic R
DATA OUTPUT
FEE
DB
AC
K
RHalf-Configuration
CONTROL
OFF
-CH
IP E
EPR
OM
( NO
TE: a
non
-vol
atile
mem
ory
is a
lread
y re
quire
d to
boo
t any
SR
AMFP
GA
from
col
d st
art .
.. th
is is
not
an
addi
tiona
l chi
p )
Rji
Ljii CEORC ,,j =RS:
(Hamming Distance)
Rji
Ljii CEORC ,,j ^ =WTA:
(Equivalence)
Terminology and Characteristics
Pristine Pool: Pristine Pool: CP. For any CiC, is member of CP at generation G if and only if
Suspect Pool:Suspect Pool: CS. For any CiC, is member of CS at generation G if and only if
at least one of
Under Repair Pool:Under Repair Pool: CU: For any CiC, is member of CU at generation G if and
only if
Refurbished Pool:Refurbished Pool: CR: after Genetic Operator applied, the new generated individual is member of CR at generation G if and only if
01
G
K
RK
LK CC
)1(0 GKCC RK
LK
11
G
K
RK
LK CC
01
G
K
RK
LK CC
ED is Discrepancy CountDiscrepancy Count of Ci and EC is Correctness CountCorrectness Count of Ci
Length of Evaluation Fitness Window:Length of Evaluation Fitness Window: W = ED+ EC
Fitness Metric:Fitness Metric: f(Ci) =EC/ EW
1.1. InitializationInitialization Population P of functionally-identical yet physically-distinct configurations Partition P into sub-populations that use supersets of physically-distinct resources,
e.g. size |P|/2 to designate physical FPGA left-half or right-half resource utilization
2.2. Fitness AssessmentFitness Assessment Discrepancy Operator is some function of
bitwise agreement between each half’s output
Four Fitness States defined for Configurations as
{CP,CS,CU,CR} with transitions, respectively:
Pristine Suspect Under Repair Refurbished
Fitness Evaluation Window W determines comparison interval
3.3. RegenerationRegeneration Genetic Operators used to recover from fault based on Reintroduction Rate
Operators only applied once then offspring returned to “service” without for concern about increasing fitness
Sketch of CRR ApproachPremise: Recovery Complexity << Design Complexity
fitness assessment viafitness assessment via
pairwise discrepancypairwise discrepancy (temporal voting vs. (temporal voting vs.
spatial voting)spatial voting)
States Transitions during lifetime of iStates Transitions during lifetime of ithth Half-Configuration Half-Configuration
Configuration Health States
pristine
suspect
refurbished
under repair
partial repair
L R
L = R
complete repair
primordial
L = R
L R
L R
L = R
L = R
LR
1
2
3
4
5
6
7
8
fi fOT
:L = R
: fi fOT
9
10
11
fi < fRT
L R:
fi < fRT
L R:
integral w ith
:fi fRT
:fi < fOT
COMPETITION
C O M P E T I T I O N
E V O L U T I O N
Procedural Flow under Competitive Runtime Reconfiguration
Initialization Population partitioned into
functionally-identical yetphysically-distincthalf-configurations
Fitness Adjustment
update fitness of onlyL and R based ondetection results
either L's or R'sfitness < Repair
Threshold?
Selectionchoose
FPGA configuration(s)labeled L and R
Detectionapply functional inputs
to compute FPGAoutputs using L, R
Adjust Controlsdetection mode, overlap interval, ...
invoke
GeneticOperators only once
and only on L or R
L=R
L=R
PRIMARYLOOP
discrepancyfree
L, R results
NO
YES
is
Integrates all fault handling stages using EC strategyIntegrates all fault handling stages using EC strategy Detects faults by the occurrence of discrepancy Isolates faults by accumulation of discrepancies Failure-specific refurbishment using Genetic Operators:
Intra-Module-Crossover, Inter-Module-Crossover, Intra-Module-Mutation
Realize online device refurbishmentRealize online device refurbishment Refurbished online without additional function or resource test vectors Repair during the normal data throughput process
Fitness Evaluation Window
• Fitness Evaluation WindowFitness Evaluation Window: W denotes number of iterations used to evaluate fitness before the state of
an individual is determined
• Determination ofDetermination of W for 3x3 multiplierfor 3x3 multiplier 6 input pins articulating 26=64 possible inputs W should be selected so that all possible inputs appear More formally,
Let rand(X) return some xi X at random
Seek W : [ rand(X) ] = X with high probabilityi=1
W
1
112
.....1
12.....
1
1
121
121
m
K
m
KK
DKK
Pm
K
xK
PK
PK
KP
K
K
KxK
xK
xK
Kx
K
K• xK = distinct orderings of K inputs showing in D trials
• if D constant, can calculate Pk>1 successively
• probability PK of K inputs showing after D trials is ratio of xK / KD
When K=64:
W Determination
Integer Multiplier Case Study
• 3bit x 3bit unsigned multiplier3bit x 3bit unsigned multiplier automated design:esign:– Building blocks
Half-Adder: 18 templates created Full-Adder: 24 templates Parallel-And : 1 template created
– Randomly select templates for instantiation in modules
GA operatorsGA operatorsExternal-Module-CrossoverInternal-Module-Crossover Internal-Module-Mutation
GA parametersGA parametersPopulation size : 20 individuals Crossover rate : 5% Mutation rate : up to 80% per bit
Experimental EvaluationExperimental EvaluationXilinx Virtex II Pro on Avnet PCI board • Objective fitness function replaced by Objective fitness function replaced by
the Consensus-based Evaluation the Consensus-based Evaluation Approach and Relative FitnessApproach and Relative Fitness
• Elimination of additional test vectorsElimination of additional test vectors• Temporal Assessment processTemporal Assessment process
Experiments Demonstrate …Experiments Demonstrate …
Template Fault Coverage
Half-Adder Template A
Half-Adder Template B
Template ATemplate A– Gate3 is an AND gate– Will lose correctness if a Stuck-At-Zero fault occurs in second
input line of the Gate3, an AND gate
Template BTemplate B – Gate3 is a NOT gate and only uses the first input line– Will work correctly even if second input line is stuck at Zero or
One
Half-Adder Template A
Regeneration PerformanceRegeneration Performance
Difference (vs. Hamming Distance)Evaluation Window, Ew = 600Suspect Threshold: S = 1-6/600=99%Repair Threshold: R = 1-4/600 = 99.3%Re-introduction rate: r = 0.1
ParametersParameters:
Repairs evolvedRepairs evolved in-situ, in real-time, without additional test in-situ, in real-time, without additional test vectors, vectors, while allowing device to remainwhile allowing device to remain partially online. partially online.
Isolation of a single faulty individual with 1-out-of-64 impact
• Outliers are identified after W iterations elapsed• E.V. = (1/64)*600 = 9.375 from minimum impact faulty individual• Isolated individual’s f differs from the average DV by 33 after 1 or more observation intervals of length W