[email protected] part 3 unified hw+sw reliability insertion & estimation

37
[email protected] 1 Part 3 Part 3 Unified HW+SW Reliability Insertion & Estimation

Upload: aron-phelps

Post on 26-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

[email protected] 1

Part 3Part 3

Unified HW+SW Reliability Insertion & Estimation

[email protected] 2

Reliability Insertion & Estimation Reliability Insertion & Estimation

for HW-SW Systemsfor HW-SW Systems((High-Level X Gate-Level Fault InjectionHigh-Level X Gate-Level Fault Injection))

Reliability-Oriented Reliability-Oriented

HW-SW PartitioningHW-SW Partitioning((Controlling system functions mapping into HW-SW to optimizeControlling system functions mapping into HW-SW to optimize reliability reliability))

TestabilityTestability-Oriented -Oriented

HW-SW PartitioningHW-SW Partitioning((Controlling system functions mapping into HW-SW to optimizeControlling system functions mapping into HW-SW to optimize testability testability))

Part 1/3Part 1/3

Part 2/3Part 2/3

Part 3/3Part 3/3

[email protected] 3

HW Part (VHDL)SW Part (C/C++)

Partitioning and Communication Channels Generation (focusing on

real-time applications)

Communication Channels

Fault-Tolerant Communication

Channels

HW/SW-Level Fault-Tolerance Generator

Initial Description (SystemC)

Mutation Constraints

Fault Injection and Simulation (Mutation Analysis)

No

Yes

No System-Level Fault Coverage & Reliability ReportsDesired reliability level?

Electromagnetic Compatibility (EMC) Tests

End Design Process

Fault-Tolerant HW Part (VHDL)

Fault-Tolerant SW Part (C/C++)

System-Level Fault Coverage & Reliability ReportsDesired reliability level?

Laboratory Verification Step

(Post-Implementation) Reliability

C/C++ Code Compilation and HW Synthesis

Initial System Specification and Partitioning Step

Step I

Step II

Step III

Yes

Simulation of Critical System Functions

Yes

System-Level Behavior ReportsDesired Real-Time Response?

No

Repartition the system differently. If necessary, divide the HW-SW parts

into two or more blocks (moving from the concept of a centralized to a

distributed system) to attend real-time constraints.

Modify the implemented fault tolerant functions or

add new ones to the design.

System Reliability

Insertion Step

Real-Time Response and System ReliabilityEstimation Steps

(Pre-Implementation)

Step IV

 Design

Methodology

Details

[email protected] 4

Reliability Insertion & Reliability Insertion &

Estimation Estimation

for HW-SW Systemsfor HW-SW Systems((High-Level X Gate-Level Fault InjectionHigh-Level X Gate-Level Fault Injection))

Part 1/3Part 1/3

[email protected] 5

Summary

1. Introduction: overview on previous works

2. The Methodology

2.1. Initial System Reliability Estimation

2.2. System Reliability Insertion & Estimation

3. Prototyping Environments

4. Case Studies

5. Conclusions

Part 1/3Part 1/3

[email protected] 6

1. Introduction: overview on previous works

Previous works suggest how to design systems towards:

monetary cost, performance, communication rates, power consumption, silicon area, testability, memory size .

None of the above strategies suggest how to partition the design & estimate the reliability

of a system in a HW/SW common basis.

For embedded systems which are safety-critical and particularly

complex to design, integrate reliability constraints during

HW-SW partitioning may have very good returns.

Part 1/3Part 1/3

[email protected] 7

In an attempt to improve this point, design teams have

developed approaches well suited to add reliability

during the design phase of embedded safety-critical

computing systems :

First, partition / estimate the reliability of the system as

it is.

Next, if necessary, insert reliability functions into the HW

- SW parts of the system.

Then, estimate the reliability of the modified system.

Part 1/3Part 1/3

1. Introduction: overview on previous works

[email protected] 8

2. The Methodology

Initial Partitioning and Reliability Estimation

Fig. 1. Reliability estimation of the system as it is.

System Description in C

SW Part(C description)

Generation of the High-Reliability HW

High-Reliability HW Part

Weak Mutants Generator

System-Level Fault Coverage & Reliability Reports Desired Reliability Level?

NO YES C Code Compilation and HW Synthesis

SW Part

Try

to p

arti

tion

the

sys

tem

diff

eren

tly

HW Part(Handel-C description)

Generation of the Com. Channels with Error Detection and/or

Correction Code

System Reliability Verification

Mutation Constraints

Partitioning

High-Reliability Functions Library

Part 1/3Part 1/3

[email protected] 9

2. The Methodology

Added-Reliability Estimation and Repartitioning

Fig. 2. Reliability insertion into the HW - SW parts / Modified system reliability estimation.

System Description in C

SW Part(C description)

Generation of the High-Reliability HW

High-Reliability HW Part

Weak Mutants Generator

System-Level Fault Coverage & Reliability Reports Desired Reliability Level?

NO YES C Code Compilation and HW Synthesis

SW Part

Try

to p

arti

tion

the

sys

tem

diff

eren

tly

HW Part(Handel-C description)

Generation of the Com. Channels with Error Detection and/or

Correction Code

System Reliability Verification

Mutation Constraints

Partitioning

High-Reliability Functions Library

Part 1/3Part 1/3

[email protected] 10

2. The Methodology

HW:

Built-In Self Test (BIST)

Triplication with Voter (Fault Masking)

Duplication with Comparator (Performance Degradation:

stop & go!)

Error Detection & Correction Codes (Minimized,

Combined Performance/Area Degradation)

Embedded Reliability Functions:

Part 1/3Part 1/3

[email protected] 11

2. The Methodology

SW:Robustness

Acceptance Tests (Specification & Placement)

Capability Check (Checks for System Capabilities at a

Given Time)

Recovery Blocks (Primary & Alternate Programs)

Stress Testing (Abnormal Situations)

Performance Testing (Real-Time Applications)

Embedded Reliability Functions:

Part 1/3Part 1/3

[email protected] 12

The goal: partition the description and estimate the reliability of the system (at a high-level description) against transient or permanent faults.

The solution: adaptation of the Mutation Analysis

Approach, originally proposed for software testing in 1978

by DeMillo. Goal development of a criterion for the

selection of test vectors: The idea was to apply a test vectors set to

the original program and to its mutated versions in order to determine

whose vectors distinguish the program from its mutated versions.

2. The Methodology

Part 1/3Part 1/3

[email protected] 13

Used a fault injection technique by means of generating small

syntactic changes in the original code and determination of which

test vectors were able to detect the mutated versions of the code.

Change of paradigm use as a criterion for fault-coverage

estimation,

i.e., system reliability verification at a high-level description.

Note: it must be shown that the stuck-at fault coverage at the gate

level ≥ than the one obtained by means of mutation analysis in a

VHDL HW description, at the system level.

2. The Methodology

Part 1/3Part 1/3

[email protected] 14

The measurement of the fault-coverage:

If a program has M mutants, E of which are equivalent, and a test set T kills K mutants, the mutation score is defined to be:

MS(P,T) = K . (M - E)

2. The Methodology

Part 1/3Part 1/3

K faults detected

E faults equivalentM faults injected

[email protected] 15

Fig. 2. Mutant Data Structure (MDS) for the weak mutation generation procedure.

The MDS for a given program consists of two parts:

an array I representing the program input test vectors, in which each element

points to an array C containing the name and state of all comparators

outputs in the program during program execution.

i1

i2

ik

Input Test Vectors Array

k arrays storing the states of the Cn program checker outputs

C 1 C 2C n nSt 1 St 2 St

C 1 C 2C n nSt 1 St 2 St

C 1 C 2C n nSt 1 St 2 St

2. The Methodology

Part 1/3Part 1/3

[email protected] 16

If any of the comparators outputs is set to “1”, then the

injected fault (i.e., the mutated statement in the code) is

detected, otherwise the fault can be classified as redundant

or even undetectable by the additional HW blocks/SW

routines (thus, lowering system reliability).

2. The Methodology

Part 1/3Part 1/3

[email protected] 17

Table 1. Mutation operators set for

VHDL/C++ functional descriptions.

Type DescriptionAOR Arithmetic Operator ReplacementABS Absolute Value InsertionCR Constant ReplacementCVR Constant for Variable ReplacementLOR Logical Operator ReplacementROR Relational Operator ReplacementODR Operation for Delay ReplacementOSR Operation for Skip ReplacementVCR Variable for Constant ReplacementVR Variable ReplacementUOI Unary Operator InsertionBOR Bit Operator Replacement

2. The Methodology

Part 1/3Part 1/3

[email protected] 18

Library IEEE;use IEEE.STD_LOGIC_1164.allentity CRYPT isport ( entrada_info : in integer range 0 to 3; entrada_chave : in integer

range 0 to 100; saida : out integer range 0 to 100 );end CRYPT;

Architecture ARCH_NAME of CRYPT isbeginprocess(entrada_info, entrada_chave, saida)variable temp1 : integer range 0 to 6;variable temp2 : integer range 0 to 18;variable temp3 : integer range 0 to 18;variable temp4 : integer range 0 to 118;constant sum_const : integer := 3;constant mul_const : integer := 2;constant sub_const : integer := 1;

begintemp1 := entrada_info + sum_const;

temp1 := entrada_info - sum_const; -- Mutant 1: AOR temp1 := entrada_info + temp2; -- Mutant 2: CVR

temp2 := temp1 * mul_const; temp2 := temp1 * sum_const; -- Mutant 3: CR delay; -- Mutant 4: ODR

temp3 := temp2 - sub_const; temp3 := sum_const - sub_const; -- Mutant 5: VCR temp3 := temp3 - sub_const; -- Mutant 6: VR

temp4 := entrada_chave + temp3; skip; -- Mutant 7: OSR

saida <= temp4;end process;

end ARCH_NAME;

Fig. 3. Example of fault injection in a VHDL description. (The symbol identifies mutated statements.)

Part 1/3Part 1/3

[email protected] 19

3. Prototyping Environments

Photo 1. Altera UP1 + Texas TMS320C67 DSP uProcessor.

Part 1/3Part 1/3

[email protected] 20

Photo 2. Altera Excalibur + SOPC

HW

3. Prototyping Environments Part 1/3Part 1/3

[email protected] 21

4. Case Study

Fault coverage comparison for

Stuck-at faults X mutation analysis

Circuit Numberof Gates

Numberof testvectors

generated

Number ofdetectablestuck-at

faults

Numberof stuck-at faultsdetected

Multiplier 2x2 19 9 116 100 %Multiplier 4x3 110 15 622 100 %Multiplier 6x6 431 30 2420 99.50 %Multiplier 8x4 353 23 1996 99.30 %Multiplier 8x6 565 25 3211 99.78 %Multiplier 8x8 809 36 4548 99.74 %

Table 2. Stuck-at fault testing summary for the 6 Multiplier

Circuit operand widths.

Circuit Numberof Gates

Numberof testvectors

generated

Number ofgeneratedmutants

Numberof

mutantskilled

Multiplier 2x2 19 9 22 95.45 %Multiplier 4x3 110 15 106 94.34 %Multiplier 6x6 431 30 432 88.19 %Multiplier 8x4 353 23 364 88.88 %Multiplier 8x6 565 25 600 88.67 %Multiplier 8x8 809 36 832 88.34 %

Table 3. Mutation analysis summary for the 6 Multiplier

Circuit operand widths.

Part 1/3Part 1/3

[email protected] 22

Table 4. Stuck-at fault testing summary for the 4 ALU operand widths.

Circuit Numberof Gates

Number oftest vectorsgenerated

Number ofdetectablestuck-at

faults

Numberof stuck-at faultsdetected

ALU - 4 bit 71 18 452 99.55 %ALU - 8 bits 155 22 980 98.98 %ALU - 12 bits 239 21 1508 98.80 %ALU - 16 bits 323 21 1908 98.53 %

Circuit Numberof Gates

Number oftest vectorsgenerated

Number ofgeneratedmutants

Numberof

mutantskilled

ALU - 4 bit 71 18 92 94.56 %ALU - 8 bits 155 22 204 92.15 %ALU - 12 bits 239 21 316 92.40 %ALU - 16 bits 323 21 428 87.38 %

Table 5. Mutation analysis summary for the 4 ALU operand widths.

4. Case StudyPart 1/3Part 1/3

[email protected] 23

Table 6. Stuck-at fault testing summary for the 3 Adder Circuit architectures.

Table 7. Weak mutation analysis summary for the 3 Adder Circuit architectures.

AdderArcuitecture

(4 bits)Numberof Gates

Numberof testvectors

generated

Number ofdetectablestuck-at

faults

Numberof stuck-at faultsdetected

Simple Adder 49 10 296 100 %Manchester 76 12 343 99.56 %

Carry Lookahead 64 10 372 99.73 %

AdderArcuitecture

(4 bits)Numberof Gates

Numberof testvectors

generated

Number ofgeneratedmutants

Numberof

mutantskilled

Simple Adder 49 10 46 86.95 %Manchester 76 12 57 83.37 %

Carry Lookahead 64 10 62 82.26 %

4. Case StudyPart 1/3Part 1/3

[email protected] 24

5. Conclusions

A unified fault injection campaign in the HW + SW

parts for systems specified in VHDL/C languages

may be reduce design cycle time and produce

confident results to help designers take

reliability-related decisions at the very early steps

of the design process.

Part 1/3Part 1/3

[email protected] 25

Reliability-Oriented Reliability-Oriented

HW-SW PartitioningHW-SW Partitioning((Controlling system functions mapping into HW-SW to optimizeControlling system functions mapping into HW-SW to optimize reliabilityreliability))

We could think on We could think on partitioning the systempartitioning the system into HW + SW into HW + SW

parts and use the parts and use the unified fault injection methodologyunified fault injection methodology described described

previously to verify which is the previously to verify which is the most reliable configurationmost reliable configuration

Part 2/3Part 2/3

[email protected] 26

Summary

1. The Methodology/Example

Add FT to the HW part and check it by means of the mutation analysis technique. The final goal is the derive a methodology to help the designer to partition the system into HW and SW parts according to FT criteria.

2. Case Study

3. Conclusions

Part 2/3Part 2/3

[email protected] 27

void crypt() {tocrypt = info ^ xor_const; /* tocrypt info constant “xor_const” */tocrypt = tocrypt + add_const; /* tocrypt tocrypt + constant “add_const” */tocrypt = (tocrypt * mult_const)<-8; /* tocrypt tocrypt * low-byte of constant “mult_const” */tocrypt = tocrypt + key; /* tocrypt tocrypt + variable “key” */} /* end of routine crypt */

execution time .

tocrypt = tocrypt + add_const;

tocrypt = (tocrypt * mult_const)<-8;

tocrypt = tocrypt + key;

if(tocript == info^xor_const); else error ! 0; stop;

if(residue(tocrypt + add_const) == residue(tocrypt) + residue(add_const)); else error ! 0; stop;

if(residue(tocrypt * mult_const) == residue(tocrypt) * residue(mult_const))<-8; else error ! 0; stop;

if(residue(tocrypt + key) == residue(tocrypt) + residue(key)); else error ! 0; stop;

void cript

tocrypt = info^xor_const;

Fig. 1. Translating the user Handel-C code into a reliable version:

(a) original routine crypt; (b) reliable version of this routine .

(a)

(b)

1. The MethodologyHereafter we add FT-tolerance to the HW part and estimate the

obtained result by means of mutation analysis

in a VHDL HW description level.

Part 2/3Part 2/3

[email protected] 28

Program “Cryptography” C code total length: 30 lines.

Consisting mainly of three routines: is_valid, crypt, set_bit.

 

void crypt() {

tocrypt = info ^ xor_const;

tocrypt = info & xor_const; /* MUTANT 1 */

tocrypt = info ^ key; /* MUTANT 2 */

tocrypt = tocrypt + add_const;

tocrypt = tocrypt + xor_const; /* MUTANT 3 */

delay; /* MUTANT 4 */

tocrypt = (tocrypt * mult_const)<-8;

tocrypt = (tocrypt - mult_const)<-8; /* MUTANT 5 */

tocrypt = (tocrypt * mult_const)\\8; /* MUTANT 6 */

tocrypt = tocrypt + key;

tocrypt = tocrypt + xor_const; /* MUTANT 7 */

skip; } /* MUTANT 8 */

Fig. 2. Example of fault injection in a Handel-C description. (The symbol identifies mutated statements.)

1. The MethodologyPart 2/3Part 2/3

[email protected] 29

is_valid crypt set_bit

System Reliability

(%)

System Partitioning

S

S

H

S

H

H

S

H

S

H

S

S

H

S

H

H

S

S

S

H

S

H

H

H

00.00

85.26

91.58

84.66

92.93

92.90

87.00

93.83

Number of Mutants Generated

Detected Not Detected Total

47

56

47

58

41

43

52

37

47

380

558

378

580

606

400

600

0

324

511

320

539

563

348

563

Table 1. System partitioning possibilities and resulting reliability.

After running this example ...

2. Case StudyPart 2/3Part 2/3

[email protected] 30

3. Conclusions

For critical applications, partitioning at early steps the system into

HW + SW parts according to reliability constraints may be of

interest (reduction of design cycle time).

Use the unified fault injection methodology in the HW + SW parts

to help estimating the most reliable configuration for the system.

Part 2/3Part 2/3

[email protected] 31

Part 3/3Part 3/3

Testability-Oriented Testability-Oriented

HW-SW PartitioningHW-SW Partitioning((Controlling system functions mapping into HW-SW to optimizeControlling system functions mapping into HW-SW to optimize testability testability))

We could think on We could think on partitioning the systempartitioning the system into HW + SW into HW + SW

parts and use the parts and use the unified fault injection methodologyunified fault injection methodology described described

previously to verify which is the previously to verify which is the most testable configurationmost testable configuration

[email protected] 32

Yves Le Traon, Ghassan Al Hayek, Chantal Robach [ITC’96]: Testability-Oriented Hardware-Software Partitioning

Test-based HW/SW partitioning approach for a co-design specification.

Depending on the HW or SW implementation choice for each unit level component, the test cost for the systems is evaluated.

The unit test costs are estimated by means of mutation-based analysis WRT the implementation choices.

1. Methodology

Part 3/3Part 3/3

[email protected] 33

The # of test vectors used for testing the SW implementation (Nsoft) and the # of test vectors for testing the HW implementation (Nhard) are computed at the unit-level component (process) and used throughout an algorithm to evaluate the testing effort for the global system.

1. Methodology

Part 3/3Part 3/3

[email protected] 34

To perform testing cost evaluation, the algorithm is based on a flowgraph which represents the control flow structure of the system.

Each node represents a process (unit-level component).

The Testing Cost (TC) of the whole specification is the sum of all costs necessary to test each path of the specification graph.

1. Methodology

1

1 1

2

2 2

3

3

4

x z

x y

tz

x

y1

3 N+2

Cn

y1 yn

xnx1

4

x2

y1

12

3

z x

y

Fig 1. Prime flowgraphs.

Part 3/3Part 3/3

If If then/else Interruptions Case Struc

[email protected] 35

Application: robot to collect precious objects in deep waters

Equipments: - Frontal sensor, to detect obstacles and objects- Boxes to place fragile objects- Boxes for non-fragile objects- Hand to pick-up objects and an electric battery to provide

energy

Routines:- Turn-right, turn-left, turn-back, advance, object analysis

2. Case StudyPart 3/3Part 3/3

[email protected] 36

HW/SW Implementation

Turn-right Turn-left Turn-back Advance Object-Analysis

System Testing Cost

S X S S S 84

S X H S S 85

H X S S S 87

H X H S S 88

S X X H S 163

H X X H S 164

S X S X H 498

S X H X H 499

H X S X H 501

H X H X H 502

2. Case StudyAfter running this example ...

Table 1. Robot testing costs by implementation choices.

Part 3/3Part 3/3

[email protected] 37

Part 3/3Part 3/3

3. Conclusions

For critical applications, partitioning at early steps the system into

HW + SW parts according to test efforts constraints may be of

interest (reduction of design cycle time).

Use the unified fault injection methodology in the HW + SW parts

to help estimating the lowest test effort for the system.