[email protected] part 3 unified hw+sw reliability insertion & estimation
TRANSCRIPT
Reliability Insertion & Estimation Reliability Insertion & Estimation
for HW-SW Systemsfor HW-SW Systems((High-Level X Gate-Level Fault InjectionHigh-Level X Gate-Level Fault Injection))
Reliability-Oriented Reliability-Oriented
HW-SW PartitioningHW-SW Partitioning((Controlling system functions mapping into HW-SW to optimizeControlling system functions mapping into HW-SW to optimize reliability reliability))
TestabilityTestability-Oriented -Oriented
HW-SW PartitioningHW-SW Partitioning((Controlling system functions mapping into HW-SW to optimizeControlling system functions mapping into HW-SW to optimize testability testability))
Part 1/3Part 1/3
Part 2/3Part 2/3
Part 3/3Part 3/3
HW Part (VHDL)SW Part (C/C++)
Partitioning and Communication Channels Generation (focusing on
real-time applications)
Communication Channels
Fault-Tolerant Communication
Channels
HW/SW-Level Fault-Tolerance Generator
Initial Description (SystemC)
Mutation Constraints
Fault Injection and Simulation (Mutation Analysis)
No
Yes
No System-Level Fault Coverage & Reliability ReportsDesired reliability level?
Electromagnetic Compatibility (EMC) Tests
End Design Process
Fault-Tolerant HW Part (VHDL)
Fault-Tolerant SW Part (C/C++)
System-Level Fault Coverage & Reliability ReportsDesired reliability level?
Laboratory Verification Step
(Post-Implementation) Reliability
C/C++ Code Compilation and HW Synthesis
Initial System Specification and Partitioning Step
Step I
Step II
Step III
Yes
Simulation of Critical System Functions
Yes
System-Level Behavior ReportsDesired Real-Time Response?
No
Repartition the system differently. If necessary, divide the HW-SW parts
into two or more blocks (moving from the concept of a centralized to a
distributed system) to attend real-time constraints.
Modify the implemented fault tolerant functions or
add new ones to the design.
System Reliability
Insertion Step
Real-Time Response and System ReliabilityEstimation Steps
(Pre-Implementation)
Step IV
Design
Methodology
Details
Reliability Insertion & Reliability Insertion &
Estimation Estimation
for HW-SW Systemsfor HW-SW Systems((High-Level X Gate-Level Fault InjectionHigh-Level X Gate-Level Fault Injection))
Part 1/3Part 1/3
Summary
1. Introduction: overview on previous works
2. The Methodology
2.1. Initial System Reliability Estimation
2.2. System Reliability Insertion & Estimation
3. Prototyping Environments
4. Case Studies
5. Conclusions
Part 1/3Part 1/3
1. Introduction: overview on previous works
Previous works suggest how to design systems towards:
monetary cost, performance, communication rates, power consumption, silicon area, testability, memory size .
None of the above strategies suggest how to partition the design & estimate the reliability
of a system in a HW/SW common basis.
For embedded systems which are safety-critical and particularly
complex to design, integrate reliability constraints during
HW-SW partitioning may have very good returns.
Part 1/3Part 1/3
In an attempt to improve this point, design teams have
developed approaches well suited to add reliability
during the design phase of embedded safety-critical
computing systems :
First, partition / estimate the reliability of the system as
it is.
Next, if necessary, insert reliability functions into the HW
- SW parts of the system.
Then, estimate the reliability of the modified system.
Part 1/3Part 1/3
1. Introduction: overview on previous works
2. The Methodology
Initial Partitioning and Reliability Estimation
Fig. 1. Reliability estimation of the system as it is.
System Description in C
SW Part(C description)
Generation of the High-Reliability HW
High-Reliability HW Part
Weak Mutants Generator
System-Level Fault Coverage & Reliability Reports Desired Reliability Level?
NO YES C Code Compilation and HW Synthesis
SW Part
Try
to p
arti
tion
the
sys
tem
diff
eren
tly
HW Part(Handel-C description)
Generation of the Com. Channels with Error Detection and/or
Correction Code
System Reliability Verification
Mutation Constraints
Partitioning
High-Reliability Functions Library
Part 1/3Part 1/3
2. The Methodology
Added-Reliability Estimation and Repartitioning
Fig. 2. Reliability insertion into the HW - SW parts / Modified system reliability estimation.
System Description in C
SW Part(C description)
Generation of the High-Reliability HW
High-Reliability HW Part
Weak Mutants Generator
System-Level Fault Coverage & Reliability Reports Desired Reliability Level?
NO YES C Code Compilation and HW Synthesis
SW Part
Try
to p
arti
tion
the
sys
tem
diff
eren
tly
HW Part(Handel-C description)
Generation of the Com. Channels with Error Detection and/or
Correction Code
System Reliability Verification
Mutation Constraints
Partitioning
High-Reliability Functions Library
Part 1/3Part 1/3
2. The Methodology
HW:
Built-In Self Test (BIST)
Triplication with Voter (Fault Masking)
Duplication with Comparator (Performance Degradation:
stop & go!)
Error Detection & Correction Codes (Minimized,
Combined Performance/Area Degradation)
Embedded Reliability Functions:
Part 1/3Part 1/3
2. The Methodology
SW:Robustness
Acceptance Tests (Specification & Placement)
Capability Check (Checks for System Capabilities at a
Given Time)
Recovery Blocks (Primary & Alternate Programs)
Stress Testing (Abnormal Situations)
Performance Testing (Real-Time Applications)
Embedded Reliability Functions:
Part 1/3Part 1/3
The goal: partition the description and estimate the reliability of the system (at a high-level description) against transient or permanent faults.
The solution: adaptation of the Mutation Analysis
Approach, originally proposed for software testing in 1978
by DeMillo. Goal development of a criterion for the
selection of test vectors: The idea was to apply a test vectors set to
the original program and to its mutated versions in order to determine
whose vectors distinguish the program from its mutated versions.
2. The Methodology
Part 1/3Part 1/3
Used a fault injection technique by means of generating small
syntactic changes in the original code and determination of which
test vectors were able to detect the mutated versions of the code.
Change of paradigm use as a criterion for fault-coverage
estimation,
i.e., system reliability verification at a high-level description.
Note: it must be shown that the stuck-at fault coverage at the gate
level ≥ than the one obtained by means of mutation analysis in a
VHDL HW description, at the system level.
2. The Methodology
Part 1/3Part 1/3
The measurement of the fault-coverage:
If a program has M mutants, E of which are equivalent, and a test set T kills K mutants, the mutation score is defined to be:
MS(P,T) = K . (M - E)
2. The Methodology
Part 1/3Part 1/3
K faults detected
E faults equivalentM faults injected
Fig. 2. Mutant Data Structure (MDS) for the weak mutation generation procedure.
The MDS for a given program consists of two parts:
an array I representing the program input test vectors, in which each element
points to an array C containing the name and state of all comparators
outputs in the program during program execution.
i1
i2
ik
Input Test Vectors Array
k arrays storing the states of the Cn program checker outputs
C 1 C 2C n nSt 1 St 2 St
C 1 C 2C n nSt 1 St 2 St
C 1 C 2C n nSt 1 St 2 St
2. The Methodology
Part 1/3Part 1/3
If any of the comparators outputs is set to “1”, then the
injected fault (i.e., the mutated statement in the code) is
detected, otherwise the fault can be classified as redundant
or even undetectable by the additional HW blocks/SW
routines (thus, lowering system reliability).
2. The Methodology
Part 1/3Part 1/3
Table 1. Mutation operators set for
VHDL/C++ functional descriptions.
Type DescriptionAOR Arithmetic Operator ReplacementABS Absolute Value InsertionCR Constant ReplacementCVR Constant for Variable ReplacementLOR Logical Operator ReplacementROR Relational Operator ReplacementODR Operation for Delay ReplacementOSR Operation for Skip ReplacementVCR Variable for Constant ReplacementVR Variable ReplacementUOI Unary Operator InsertionBOR Bit Operator Replacement
2. The Methodology
Part 1/3Part 1/3
Library IEEE;use IEEE.STD_LOGIC_1164.allentity CRYPT isport ( entrada_info : in integer range 0 to 3; entrada_chave : in integer
range 0 to 100; saida : out integer range 0 to 100 );end CRYPT;
Architecture ARCH_NAME of CRYPT isbeginprocess(entrada_info, entrada_chave, saida)variable temp1 : integer range 0 to 6;variable temp2 : integer range 0 to 18;variable temp3 : integer range 0 to 18;variable temp4 : integer range 0 to 118;constant sum_const : integer := 3;constant mul_const : integer := 2;constant sub_const : integer := 1;
begintemp1 := entrada_info + sum_const;
temp1 := entrada_info - sum_const; -- Mutant 1: AOR temp1 := entrada_info + temp2; -- Mutant 2: CVR
temp2 := temp1 * mul_const; temp2 := temp1 * sum_const; -- Mutant 3: CR delay; -- Mutant 4: ODR
temp3 := temp2 - sub_const; temp3 := sum_const - sub_const; -- Mutant 5: VCR temp3 := temp3 - sub_const; -- Mutant 6: VR
temp4 := entrada_chave + temp3; skip; -- Mutant 7: OSR
saida <= temp4;end process;
end ARCH_NAME;
Fig. 3. Example of fault injection in a VHDL description. (The symbol identifies mutated statements.)
Part 1/3Part 1/3
3. Prototyping Environments
Photo 1. Altera UP1 + Texas TMS320C67 DSP uProcessor.
Part 1/3Part 1/3
Photo 2. Altera Excalibur + SOPC
HW
3. Prototyping Environments Part 1/3Part 1/3
4. Case Study
Fault coverage comparison for
Stuck-at faults X mutation analysis
Circuit Numberof Gates
Numberof testvectors
generated
Number ofdetectablestuck-at
faults
Numberof stuck-at faultsdetected
Multiplier 2x2 19 9 116 100 %Multiplier 4x3 110 15 622 100 %Multiplier 6x6 431 30 2420 99.50 %Multiplier 8x4 353 23 1996 99.30 %Multiplier 8x6 565 25 3211 99.78 %Multiplier 8x8 809 36 4548 99.74 %
Table 2. Stuck-at fault testing summary for the 6 Multiplier
Circuit operand widths.
Circuit Numberof Gates
Numberof testvectors
generated
Number ofgeneratedmutants
Numberof
mutantskilled
Multiplier 2x2 19 9 22 95.45 %Multiplier 4x3 110 15 106 94.34 %Multiplier 6x6 431 30 432 88.19 %Multiplier 8x4 353 23 364 88.88 %Multiplier 8x6 565 25 600 88.67 %Multiplier 8x8 809 36 832 88.34 %
Table 3. Mutation analysis summary for the 6 Multiplier
Circuit operand widths.
Part 1/3Part 1/3
Table 4. Stuck-at fault testing summary for the 4 ALU operand widths.
Circuit Numberof Gates
Number oftest vectorsgenerated
Number ofdetectablestuck-at
faults
Numberof stuck-at faultsdetected
ALU - 4 bit 71 18 452 99.55 %ALU - 8 bits 155 22 980 98.98 %ALU - 12 bits 239 21 1508 98.80 %ALU - 16 bits 323 21 1908 98.53 %
Circuit Numberof Gates
Number oftest vectorsgenerated
Number ofgeneratedmutants
Numberof
mutantskilled
ALU - 4 bit 71 18 92 94.56 %ALU - 8 bits 155 22 204 92.15 %ALU - 12 bits 239 21 316 92.40 %ALU - 16 bits 323 21 428 87.38 %
Table 5. Mutation analysis summary for the 4 ALU operand widths.
4. Case StudyPart 1/3Part 1/3
Table 6. Stuck-at fault testing summary for the 3 Adder Circuit architectures.
Table 7. Weak mutation analysis summary for the 3 Adder Circuit architectures.
AdderArcuitecture
(4 bits)Numberof Gates
Numberof testvectors
generated
Number ofdetectablestuck-at
faults
Numberof stuck-at faultsdetected
Simple Adder 49 10 296 100 %Manchester 76 12 343 99.56 %
Carry Lookahead 64 10 372 99.73 %
AdderArcuitecture
(4 bits)Numberof Gates
Numberof testvectors
generated
Number ofgeneratedmutants
Numberof
mutantskilled
Simple Adder 49 10 46 86.95 %Manchester 76 12 57 83.37 %
Carry Lookahead 64 10 62 82.26 %
4. Case StudyPart 1/3Part 1/3
5. Conclusions
A unified fault injection campaign in the HW + SW
parts for systems specified in VHDL/C languages
may be reduce design cycle time and produce
confident results to help designers take
reliability-related decisions at the very early steps
of the design process.
Part 1/3Part 1/3
Reliability-Oriented Reliability-Oriented
HW-SW PartitioningHW-SW Partitioning((Controlling system functions mapping into HW-SW to optimizeControlling system functions mapping into HW-SW to optimize reliabilityreliability))
We could think on We could think on partitioning the systempartitioning the system into HW + SW into HW + SW
parts and use the parts and use the unified fault injection methodologyunified fault injection methodology described described
previously to verify which is the previously to verify which is the most reliable configurationmost reliable configuration
Part 2/3Part 2/3
Summary
1. The Methodology/Example
Add FT to the HW part and check it by means of the mutation analysis technique. The final goal is the derive a methodology to help the designer to partition the system into HW and SW parts according to FT criteria.
2. Case Study
3. Conclusions
Part 2/3Part 2/3
void crypt() {tocrypt = info ^ xor_const; /* tocrypt info constant “xor_const” */tocrypt = tocrypt + add_const; /* tocrypt tocrypt + constant “add_const” */tocrypt = (tocrypt * mult_const)<-8; /* tocrypt tocrypt * low-byte of constant “mult_const” */tocrypt = tocrypt + key; /* tocrypt tocrypt + variable “key” */} /* end of routine crypt */
execution time .
tocrypt = tocrypt + add_const;
tocrypt = (tocrypt * mult_const)<-8;
tocrypt = tocrypt + key;
if(tocript == info^xor_const); else error ! 0; stop;
if(residue(tocrypt + add_const) == residue(tocrypt) + residue(add_const)); else error ! 0; stop;
if(residue(tocrypt * mult_const) == residue(tocrypt) * residue(mult_const))<-8; else error ! 0; stop;
if(residue(tocrypt + key) == residue(tocrypt) + residue(key)); else error ! 0; stop;
void cript
tocrypt = info^xor_const;
Fig. 1. Translating the user Handel-C code into a reliable version:
(a) original routine crypt; (b) reliable version of this routine .
(a)
(b)
1. The MethodologyHereafter we add FT-tolerance to the HW part and estimate the
obtained result by means of mutation analysis
in a VHDL HW description level.
Part 2/3Part 2/3
Program “Cryptography” C code total length: 30 lines.
Consisting mainly of three routines: is_valid, crypt, set_bit.
void crypt() {
tocrypt = info ^ xor_const;
tocrypt = info & xor_const; /* MUTANT 1 */
tocrypt = info ^ key; /* MUTANT 2 */
tocrypt = tocrypt + add_const;
tocrypt = tocrypt + xor_const; /* MUTANT 3 */
delay; /* MUTANT 4 */
tocrypt = (tocrypt * mult_const)<-8;
tocrypt = (tocrypt - mult_const)<-8; /* MUTANT 5 */
tocrypt = (tocrypt * mult_const)\\8; /* MUTANT 6 */
tocrypt = tocrypt + key;
tocrypt = tocrypt + xor_const; /* MUTANT 7 */
skip; } /* MUTANT 8 */
Fig. 2. Example of fault injection in a Handel-C description. (The symbol identifies mutated statements.)
1. The MethodologyPart 2/3Part 2/3
is_valid crypt set_bit
System Reliability
(%)
System Partitioning
S
S
H
S
H
H
S
H
S
H
S
S
H
S
H
H
S
S
S
H
S
H
H
H
00.00
85.26
91.58
84.66
92.93
92.90
87.00
93.83
Number of Mutants Generated
Detected Not Detected Total
47
56
47
58
41
43
52
37
47
380
558
378
580
606
400
600
0
324
511
320
539
563
348
563
Table 1. System partitioning possibilities and resulting reliability.
After running this example ...
2. Case StudyPart 2/3Part 2/3
3. Conclusions
For critical applications, partitioning at early steps the system into
HW + SW parts according to reliability constraints may be of
interest (reduction of design cycle time).
Use the unified fault injection methodology in the HW + SW parts
to help estimating the most reliable configuration for the system.
Part 2/3Part 2/3
Part 3/3Part 3/3
Testability-Oriented Testability-Oriented
HW-SW PartitioningHW-SW Partitioning((Controlling system functions mapping into HW-SW to optimizeControlling system functions mapping into HW-SW to optimize testability testability))
We could think on We could think on partitioning the systempartitioning the system into HW + SW into HW + SW
parts and use the parts and use the unified fault injection methodologyunified fault injection methodology described described
previously to verify which is the previously to verify which is the most testable configurationmost testable configuration
Yves Le Traon, Ghassan Al Hayek, Chantal Robach [ITC’96]: Testability-Oriented Hardware-Software Partitioning
Test-based HW/SW partitioning approach for a co-design specification.
Depending on the HW or SW implementation choice for each unit level component, the test cost for the systems is evaluated.
The unit test costs are estimated by means of mutation-based analysis WRT the implementation choices.
1. Methodology
Part 3/3Part 3/3
The # of test vectors used for testing the SW implementation (Nsoft) and the # of test vectors for testing the HW implementation (Nhard) are computed at the unit-level component (process) and used throughout an algorithm to evaluate the testing effort for the global system.
1. Methodology
Part 3/3Part 3/3
To perform testing cost evaluation, the algorithm is based on a flowgraph which represents the control flow structure of the system.
Each node represents a process (unit-level component).
The Testing Cost (TC) of the whole specification is the sum of all costs necessary to test each path of the specification graph.
1. Methodology
1
1 1
2
2 2
3
3
4
x z
x y
tz
x
y1
3 N+2
Cn
y1 yn
xnx1
4
x2
y1
12
3
z x
y
Fig 1. Prime flowgraphs.
Part 3/3Part 3/3
If If then/else Interruptions Case Struc
Application: robot to collect precious objects in deep waters
Equipments: - Frontal sensor, to detect obstacles and objects- Boxes to place fragile objects- Boxes for non-fragile objects- Hand to pick-up objects and an electric battery to provide
energy
Routines:- Turn-right, turn-left, turn-back, advance, object analysis
2. Case StudyPart 3/3Part 3/3
HW/SW Implementation
Turn-right Turn-left Turn-back Advance Object-Analysis
System Testing Cost
S X S S S 84
S X H S S 85
H X S S S 87
H X H S S 88
S X X H S 163
H X X H S 164
S X S X H 498
S X H X H 499
H X S X H 501
H X H X H 502
2. Case StudyAfter running this example ...
Table 1. Robot testing costs by implementation choices.
Part 3/3Part 3/3
Part 3/3Part 3/3
3. Conclusions
For critical applications, partitioning at early steps the system into
HW + SW parts according to test efforts constraints may be of
interest (reduction of design cycle time).
Use the unified fault injection methodology in the HW + SW parts
to help estimating the lowest test effort for the system.