a fault injection technique for vhdl behavioral-level...

10
ection Technique for ehavioral-Level Models TODD A. DELONG BARRY W. JOHNSON Universiiy of Virginia JOSEPH A. PROFETA 111 Union Switch and Signal, Inc. AS EARLY AS THE 1960~, designers recognized the importance of incorporating fault tolerance into microelectronic designs. However, they often performed this task late in the process when the design was near completion. As computer systems become more complex, designers must consider fault tolerance throughout the design process to allow early estimation of reliabil- ity and fault coverage. Designers usually perform dependability parameter estimation (DPE) at a high level of abstraction, using stochastic Petri net or queuing models. However, as specifications become more demanding, designers must go to increasingly lower levels of modeled detail to achieve the desired results. One such level of modeling is the instruction set architecture (ISA), where a modeled pro- cessing element executes machine instruc- tions. Historically, designers have achieved the ISA level of detail with a gate- or device- level model of the processor. Fault injection is an important technique for the evaluation of design metrics such as reliability, safety, and fault coverage. The process involves inserting faults into a sys- tem and monitoring the system to determine how it behaves in response. Researchers have made several efforts to develop tech- niques for injecting faults into a system pro- totype or model; most of these fall into three categories: hardware-, software-, and simu- lation-based fault injection. Practitioners have accomplished hard- ware-based fault injection by such methods as bombarding the hardware with heavy-ion radiation,’ injectingvoltage sags on the hard- ware’s power rails,’and corrupting logic val- ues at the pin level.2 An example of software-based fault injection, known as fault injection-based automated testing (FIAT), injects faults into the actual source code in a real-time en~ironment.~ These hardware- and software-based approaches do produce a dependability analysis of the system. However, they do not lend them- selves easily to DPE during the design process, because they require a physical working system for the fault injection ex- periments. Consequently, by these methods, DPE takes place toward the end of-rather than throughout-the design process. Currently,however, researchers are realiz- ing the advantagesof simulation as a means to perform DPE. Typically, they perform simula- tions at gate where signalvalues in the simulation are stuck at logic 1 or 0, or at de- vice level,6 where current or voltagevalues are fixed and a fault propagates to the gate level. Simulation approaches lend themselves nicely to the design process, but they do have a couple of shortcomings. First, we of- ten need an implementation of the device before detailed simulations are possible. But the implementation is usually not available 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS

Upload: trinhquynh

Post on 16-Mar-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

ection Technique for ehavioral-Level Models

TODD A. DELONG BARRY W. JOHNSON Universiiy of Virginia

JOSEPH A. PROFETA 111 Union Switch and

Signal, Inc.

AS EARLY AS THE 1960~, designers recognized the importance of incorporating fault tolerance into microelectronic designs. However, they often performed this task late in the process when the design was near completion. As computer systems become more complex, designers must consider fault tolerance throughout the design process to allow early estimation of reliabil- ity and fault coverage.

Designers usually perform dependability parameter estimation (DPE) at a high level of abstraction, using stochastic Petri net or queuing models. However, as specifications become more demanding, designers must go to increasingly lower levels of modeled detail to achieve the desired results. One such level of modeling is the instruction set architecture (ISA), where a modeled pro- cessing element executes machine instruc- tions. Historically, designers have achieved the ISA level of detail with a gate- or device- level model of the processor.

Fault injection is an important technique for the evaluation of design metrics such as reliability, safety, and fault coverage. The process involves inserting faults into a sys- tem and monitoring the system to determine how it behaves in response. Researchers have made several efforts to develop tech- niques for injecting faults into a system pro- totype or model; most of these fall into three categories: hardware-, software-, and simu-

lation-based fault injection. Practitioners have accomplished hard-

ware-based fault injection by such methods as bombarding the hardware with heavy-ion radiation,’ injecting voltage sags on the hard- ware’s power rails,’ and corrupting logic val- ues at the pin level.2 An example of software-based fault injection, known as fault injection-based automated testing (FIAT), injects faults into the actual source code in a real-time en~i ronment .~ These hardware- and software-based approaches do produce a dependability analysis of the system. However, they do not lend them- selves easily to DPE during the design process, because they require a physical working system for the fault injection ex- periments. Consequently, by these methods, DPE takes place toward the end of-rather than throughout-the design process.

Currently, however, researchers are realiz- ing the advantages of simulation as a means to perform DPE. Typically, they perform simula- tions at gate where signal values in the simulation are stuck at logic 1 or 0, or at de- vice level,6 where current or voltage values are fixed and a fault propagates to the gate level.

Simulation approaches lend themselves nicely to the design process, but they do have a couple of shortcomings. First, we of- ten need an implementation of the device before detailed simulations are possible. But the implementation is usually not available

24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS

Page 2: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

until the end of the design process, if at all. For example, a gate- or transistor-level description of a commercial proces- sor is seldom available to a designer. Thus, we cannot per- form DPE at the gate or device level early in the design process, where the real advantages of quantitative design comparison are apparent.

Also, gate- and device-level simulations are computa- tionally intensive. Thus, we can perform only a small num- ber of simulations in a reasonable amount of time. This becomes a problem when a design's fault tolerance speci- fications require a large number of experiments for the re- sults to have the desired level of statistical significance.

Researchers have made several attempts to perform DPE using hardware description languages for fault simulation at the behavioral level.71o Previous methods have not been entirely satisfactory; see the DPE box (next page) for out- lines of these techniques.

Our approach to fault injection We have developed a fault injection technique for DPE that

allow:; the designer to inject faults at the ISA level, where a b e havioral model of a processor written in VHDL executes ac- tual machine code. This methodology provides the detail needed to meet the system dependability and performance specifications. At the same time, since this model is function- ally and temporally complete, we need no gate- or device- level model of the hardware at this stage of the evaluation. Thus, our method reduces simulation time, because it reduces the level of modeled detail. This allows designeis to perform a large number of simulations on readily available worksta- tions, improving the results by increasing the sample size.

Also, designers can use this approach during the design process as they develop hardware and software. The method can provide detailed faulty system behavior information, but does not require a system prototype. This type of mod- eling should increase the quality of the system fault toler- ance 'ittributes while reducing product time to market.

Our technique has several advantages over others. First, because it makes fault-free and faulty behavior independent, it can use existing models with minimal changes to the ex- isting code. Also, designers can use the same model to sim- ulate the fault-free and faulty behavior without creating separate models or recompiling the model for each fault. In addition, the technique uses standard VHDL types to per- form the fault injection. This means that there is no need to create new functions to operate on these types (such as log- ical functions and file operation routines).

Finiilly, our technique is simulator independent because it accomplishes fault injection with standard VHDL features. Thus, it does not require the special features of a particular simulator. This includes the fault injection experiment in- formation communicated to the simulation via text file 110.

Figure 1. VHDl code for duta types used in fault injection.

Designers can thus perform a large number of simulations by executing them in a background mode.

In this article, we discuss our fault injection technique and briefly describe the example architecture we used as a proof of concept for some of the theory mentioned earlier. We also present the implementation details associated with applying our technique to an embedded control system providing fail- safe operation in the railway industry. Finally, we provide the results we obtained from evaluating the modeled system.

The method Before we present our fault injection technique, we need

to address bus resolution functions. A BRF is associated with a signal type. When two different sources are trying to update a signal of that type at the same real time, the BRF resolves the value. For instance, suppose that two sources try to up- date a signal of type Bit; one source tries to assign a 0, and the other, a 1. A simple BRF might always assign the signal a val- ue of 1 when two or more sources try to update it and one tries to update it with a 1; otherwise, it would assign the signal a value of 0. In this case, the BRF would be the OR function.

The basic idea underlying the method we developed for in- jecting faults at the ISA level is the ability to communicate to the BRF when to inject a fault. This allows the BRF to corrupt the new value assigned to the signal. The communication oc- curs via a user-defined VHDL data type called fault8 (or faultl6), which represents 8 bits of data (fault16 represents 16 bits of data) and the appropriate fault information. Figure 1 shows the VHDL code for fault8 (faultl6).

WINTER 1996 25

Page 3: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

26 IEEE DESIGN 8t TEST OF COMPUTERS

Page 4: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

+- New fault- free value

Faultto inject

Current value

inject

f- Data 01010101

+ Mask 1 lxxxx00

RO 11 01 01 00 f--

f- Data 11111111 - Mask 1 lxxxxOO

The data type is a record consisting of three fields: data, cntl, and mask. The data

fault-free value. The cntl field indicates whether the signal value is being accessed (read

The tlhird field, mask, indi- cates where the fault is to be injected-that is, in which bit position(s) the fault will ap- pear--and whether the bit(s) will be stuck-at-1, stuck-at-0, or x, which indicates that the bit remains unchanged. We

field contains the signal’s (a)

on mask and data Faultto - New fault- free value No fault to inject

f- Data 01010101

RO 11111100

from) or updated (written to). (b)

RO 01 01 01 01 f--

(cl Figure 2. Possible scenarios considered by the BRF for signal RO with a current value of I I I 1 I 1 I 1 : RO is updated at the same time a fault is injected (a); a fault is injected, but RO is not updated (b); and RO is updated, but no fault is injected (c).

implemented the mask in this fashion so that designers can simulate single- or multiple-bit faults with the same data type.

This fault injection technique covers three scenarios (see Figure 2). First, a fault is injected at the same time the signal is updlated with a new value. In this case, the fault informa- tion (that is, the maskvalue) comes from the faulty value, but the new data for the signal comes from the fault-free value. This ensures that the signal receives the most recent value assigned to it, with the fault injected into this new value rather than the current value.

Imagine that RO in Figure 2 is a signal in a processor mod- el. One processstatement in the model-the main process- assigns the fault-free value to RO. A second process statement-the fault process-assigns the faulty value to RO. In the first scenario, the main and fault processes try to up- date the value of RO simultaneously. The main process as- signs the value 01010101 to RO’s data field; the fault process assigns the value 1 lxxxx00 to RO’s mask field. The BRF de- termines that a fault is being injected, because the maskval- ue from the fault process assignment is not xxxxxxxx. So, it uses the data value from the main process and the mask val- ue froin the fault process to determine ROs new value. Figure 2a shows this situation.

The second scenario (Figure ab) occurs when a fault is injected, but the signal is not updated. In this case, all the information can come from the faulty value, since the faulty value contains the signal’s current (and most recent) value. Finally, our BRF covers the case of the signal being updated, but no fault is injected (Figure 2c). In this case, the signal receives the fault-free value.

Figure 3 (next page) shows the algorithm for bit8-fi (bitl6_fi), the BRF. The algorithm begins by determining i f any faulty values are being assigned to the signal. It does this by searching through all the values being assigned to the sig- nal to see if any of them has a fault mask value not equal to

the fault-free mask value (see Figure 3, Block A). Next, the algorithm considers the various possible sce-

narios. The three scenarios shown in Figures 2a, 2b, and 2c correspond to Blocks B, C, and D in Figure 3. The BRF de- termines when a fault is injected into a given signal and as- signs the signal the “correct” value-the most recent value of the signal, corrupted accordingly.

This technique has several important characteristics. First, we need not rework the logical operations that can be per- formed using Bit types, because our technique does not change this signal type in any way. For instance, the implicit VHDL “and” operator could still perform an AND on RO, be- cause RO’s data field is still of type Bit.

Also, we can automate our technique, since it uses stan- dard VHDL types (that is, Bit and String) to assign the faulty value to the signal. The simulator can read fault information for a given experiment from a text file using the standard read and write functions for these types instead of newly c re ated read and write functions for augmented types.

In addition, using this technique does not require exten- sive changes to an existing model to accommodate the fault injection. For instance, Figure 4 shows an example of a change to an existing model. Figure 4a shows the original signal definition for RO and a signal assignment to RO, de- fined as type bit8 (see Figure l). In Figure 4b, we redefine RO as an alias of the data field for signal RO-reg, which is of type bit8-res (see Figure 1). Now, any references to RO still cor- respond to references to a type-bit8 signal. So, we do not need to change the existing VHDL code that references RO (such as the signal assignment).

Example application As a proof of concept for our fault injection technique, we

selected and modeled a system ar~hitecture.~ Our sample

WINTER 1996 27

Page 5: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

Block A

..

-- Find out if a fault is being injected. ..

- for I in input'low to input'high loop if (input(!) mask /= fault-free) then

fault-present =true, fault-num = I,

no-fault =true, no-fault-num = I,

else

end if, - end loop,

_.

-- If a fault is being injected at the same time the signal is -- getting updated, then assign the data from the -- non-faulty signal so as to be sure to get the most -- recent value for the signal and use the fault and -- mask from the faulty signal to inject the fault

I Block B

..

--if fault-present and no-fault then temp-faukmask := input(fau1t-num).mask; temp-faukdata := input(no-fault-num).data; temp-fault.cnt1 := input(no-fault-num).cntl; for j in temp-faukdata'high downto

temp-fault.data'low loop if (input(fau1t-num).mask(jtl) = '1') then

elsif (input(faultLnum).mask(jtl) = '0') then

end if;

tempfault.data(j) := 7';

temp-fault.data(j) := '0';

end loop; ..

-- If the fault is being injected while the signal is not -- getting updated, then use the information in the faulty --signal because it contains the current value of the data -- field for the signal. ..

28 IEEE DESIGN & TEST OF COMPUTERS

Block C

- elsif fault-present then temp_fault.mask := input(fau1t-num).mask; temp-fault.data := input(fault-num).data; temp-fault.cntl := input(fault-num).cntl; for j in tempfaultdata'high downto

tempfau1t.data'low loop if (input(fault-num).mask(jtl) = '1') then

elsif (input(fault-num),mask(jti) = '0') then temp-fault.data(j) := '1';

temp-fault.data(j) := '0'; end if,

end loop, elsif no-fault then

temp-fault = input(no-fault-num), end if, return temp-fault,

L Block D [

Figure 3. VHDl listing of the BRF algorithm.

application is an interlocking control system (ICs) that has several million hours of continuous safe operation. The ICs is used in the railway industry to control wayside train func- tions such as track switching, light signaling, and train oc- cupancy. Figure 5 shows a block diagram of the architecture, which contains components typically found in a simple uniprocessor system. Namely, the system includes a Motorola

Figure 4. Example of a change made io existing models to implement our fault injection technique. The original signal definition for RO and a signal assignment io RO (a); the new definition for RO (61. RO is now an alias for the data field of signal RO-reg, which is of k p e bit8-res. So, the same assign- menf can be made.

,----+Gz PIA I/O registers

MCG809 microprocessor

Figure 5. Block diagram of the system architecture.

MC6809 microprocessor; slave and master asynchronous communication interface adapters (ACIAs); peripheral in- terface adapters (PlAs); system timers (programmable timers that can function as watchdogs, for instance): and memory, which includes RAM and programmable read-only memory (PROM). Also, we developed a model that captures the func- tionality of the system input/output cards. We validated the entire system model by comparing the model's fault-free op- eration with that of the real ~ys t em.~

The example system is a uniprocessor, safety-critical sys- tem that contains no hardware redundancy. Instead, it re- lies on software, through diversity, to provide the system's fault tolerance. The interrupt-driven system responds to an external hardware interrupt that one of the timers in the sys- tem timers module generates every millisecond. The system uses this interrupt to schedule its 19 main executive software routines that perform the diagnostic testing, logic equation evaluation, and serial communication.

Of these 19 routines, 11 provide diagnostic capabilities and logic equation evaluation for the ICs unit. The remain-

Page 6: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

ing eight routines function in conjunction with the master and slave ACIAs, which the system uses to communicate

Table 7. Descripfion of executive sohare for the ICs (in order of priorify).

-=+- Master AClA

with other ICs units. Since this analysis was for a single ICs unit, we did not consider these eight routines in the perfor- mance and safety analysis. Table 1 describes the 11 routines we considered and the ERRORS routine.

The ICs system operates as follows. Every millisecond, one of the system timers generates an FlRQ interrupt. Every tenth time this happens, FIRQ schedules TQMR. Every twen- tieth time TQMR executes, it schedules PINHND to read the inputs. Once PlNHND has executed, it schedules PINPRO to execute once for every input board in the system. If any of the inputs have changed since the last time they were resad, PINPRO indicates which equations must be evaluat- ed because of the input changes and schedules INTPR once for each of these equations. lNTPR then schedules OUPPRO, which delivers the outputs. After OUPPRO delivers the out- puts , it schedules GUARD five times to check the hardware that generated the outputs just delivered.

Thus, the ICs system is event driven as opposed to cyclic, because the routine execution is not deterministic. For in- stance, if the inputs did not change from one read cycle to the next, INTPR, OUPPRO, and GUARD would not execute.

Memory t

Fault injection into the ICs A module called a fault injection controller (FIC) mod-

ule controls fault injection for the entire model. The FIC re- quires one file to describe the desired fault injection experiments. This file contains information such as the fault location (that is, where to inject the fault-which processor register or which memory location), the fault mask to use, when to start the fault injection, and when to end the fault injection. In addition, it also contains a field that indicates whether to inject the fault into the CPU or memory; this field then decodes the information as to where to inject the fault, Figure 6 illustrates the FlC with the system model.

Figure 7 shows an algo- rithm written in pseudo- VHDL for the FlC. (The figure shows pseudocode when the actual VHDL code could not fit on a single page.) At thestart of thesimulation, the FIC reads any required de- coding information. This could be information that ini- tiates fault injection any- where in memory (such as

MG6809 microprocessor .

4

Data/ address

bus -

Routine Description

FlRQ TQMR

POPTST GUARD

TRACE

PINHND PINPRO

INTPR

OUPPRO SHTDIA

LNGDIA

ERRORS

Runs every time an FlRQ interrupt occurs Schedules the other routines based on how

many times each routine has already executed and the priority of each routine

Diagnostic routine that tests the outputs Diagnostic routine that tests the data and

address buses, input and output boards, processor, and so forth, each time the outputs are processed and delivered

Diagnostic routine that checks whether the sohware routines are running in the correct order

Reads the inputs Processes the inputs and determines which

Boolean equations must be evaluated Evaluates the Boolean equations; executes

once for each equation Delivers the outputs Set of six diagnostic routines, one of which

executes each time SHTDIA executes Set of diagnostic routines that thoroughly test

the outputs, processor, and memory Performs a safe shutdown of the system

whenever one of the 1 1 executive routines listed above detects an error

Fault control signal

c------c

Fault injection experiment file

Fault injection controller

figure 6. System model with our fault injection module.

WINTER 1996 29

Page 7: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

def-mon_colors,O ns); wail until token_ackod(tault-.~nllji)); release _token(faLilt .ontl(i));

end if,

wait, else

end if; Miid process.

fault -lime . ptvcess beyii I

wait tor clock-pcriod: current-tiriie <= Now:

er id process:

Figure 7. A/gorifhm for the fault injection controller.

address ranges in the memory map) or information for in- jecting faults into the processor registers (such as a mapping scheme between an integer and the corresponding register in the CPU).

After the simulation begins, the FIC reads the fault experi- ment information and generates a fault control signal. Thissig nal, also called a fault control token, contains all the necessary fault information for the given fault injection experiment. The FIC sends the token to the appropriate module in the system model. For instance, for fault injection into the processor, the FIC sends the processor a token that initiates the fault injec. tion as described by the information contained in the signal.

After the FIC generates the fault control token, it waits (not

(Wait until corrupted signal is updated 3

ack- tokon(fa1At control token) wait ~iiitrl token -rrieasod(fnLilt coritroi token), I emove. token(fault control tohr-vi)

Plld loop fZtJlt-lCUp.

2nd process.

Figure 8. GeneraLpurpose algorithm for the fault injectors.

requiring any simulation cycles) until the processor ac- knowledges the signal This does not occur until the fault in- jection experiment ends. After the processor acknowledges the fault control token, it waits (again not requiring anysim- ulation cycles) for another fault control token. The FIC con- tinues to generate fault control tokens until it reaches the end of the fault information file or some other module shuts down the simulation.

In addition to the FIC, we must augment each regular module in the system model with an extra process statement to perform fault injection for that module. We tailor each of these fault injection processes, called fault injectors (FIs), to the individual module. For instance, the FI for the proces- sor module must contain a section of code that decodes the fault information to determine which register to corrupt. The FIs require similar specialized code for each module, but the general algorithm for the FIs is the same. Note that the FI is independent of the fault-free module operation; we can add it without changing any of the existing module code. Figure 8 shows the pseudocode for the FI algorithm.

The F1 begins by waiting for a fault control token from the FIC if one is not already present. Once it receives a signal, it retrieves the end time for the fault injection experiment, the fault injection location (for the processor, this would be a reg- ister; for the memory map, an address range), and the fault mask. Next, it enters a loop in which the fault injection takes

30 IEEE DESIGN 8t TEST OF COMPUTERS

Page 8: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

MC6809 rnicroDrocessor

8-bit datdlg-bit address bus F

I U register I PC register

Synchronization signal F

Reset NMI FlRQ IRQ

Figure IO. Block diagram of the MC6809 module (program- mer’s model).

Figure 9. General-purpose algorithm to monitor a location where a fault is injected.

place. First, the FI waits for the fault injection location to b e come active. Once this occurs, the FI checks the current time against the end time for the experiment. If the end time is past, the FI is deactivated and exits the loop. Otherwise, it injects the fault. Once it exits the loop, the FI acknowledges the fault con- trol token and waits for the next one.

Each module also contains a separate process (an ob- server) that monitors the location where the fault is injected. For each access to this location (read or write), the observ- er writes the value to a trace file. We can then use this trace file to determine with absolute certainty whether the fault produced an error. Again, the observer is independent of the fault-free module operation.

Figure 9 shows the generic process for the observer. This algorithm is very similar in structure to that for injecting faults (Figure 8). The only real difference is that rather than in- jecting a fault, this algorithm outputs the value of the fault in- jection location to the trace file when it becomes active. Note that this activation includes reading from, as well as writing to, the location because it updates the cntl field for both of these operations.

To determine when a fault causes an error, we compare the trace file from the fault injection experiment with the trace file from the fault-free case. The trace file contains the simulation time index; the data value of the fault injection location; and the cntl field value, which indicates whether the location is

being read or written. We use this information to determine if and when an error occurs. A mask field value set to the fault- free value (that is, all x’s) creates the fault-free trace file.

Fault injection results We performed a series of permanent fault injection ex-

periments to estimate the probability of detection for 11 of the executive routines as well as the fault detection latency times. (Other workPo present a complete discussion of the resutts of the fault injection experiments, including the effect of transient faults on the system.) For this group of experi- ments, we injected single stuck-at-1 and stuck-at4 faults into each bit position of all the &bit (A, B, DP, CC, and IR) and 16-bit (X, Y, S, U, and PC) registers of the processor shown in Figure 10. The time to start (and stop) injecting the faults was arbitrary. For convenience, we chose the beginning (and ending) of each executive routine’s execution. Thus, the fault was effectively permanent for each routine. This yielded a total of 2,640 unique faults, since the time of occurrence is also a fault attribute. (SA1 and SA0 faults in five &bit and five 16-bit registers come to 240 unique faults for any given time; for 11 routines, this equals 2,640 total faults.) We injected the faults at the beginning of the first execution of each routine.

We allowed the fault injection simulations to continue until one of three things happened: an executive routine called the ERRORS routine (see Table l), the watchdog timer detected the fault, or the routine completed. If either of the first two events occurred, we considered the fault de- tected. If , however, the third event occurred, we had to com- pare the trace files for the fault-free and faulty simulations to determine whether the fault produced an error.

WINTER 1996 31

Page 9: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

Table 2. Detection latency for internal processor faults (in ms).

Routine Minimum Maximum Average

FlRQ TQMR POPTST GUARD TRACE PINPRO PINHND OUPPRO INTPR SHTDIA LNGDIA

0.04 0.04 0.03 0.04 0.04 0.03 0.03 0.05 0.06 0.05 0.04

13.64 12.23 16.10 12.22 9.27

15.01 1 7.34 21 .oo 8.89

20.1 2 15.50

2.26 3.92 2.78 3.20 2.89 4.51 2.51 5.90 2.10 5.25 4.31

Overall, the system detected 2,629 out of 2,640 faults when the system was allowed to run for a complete diagnostic cy- cle. The 11 remaining undetected faults corresponded to a stuck-at-1 fault in bit position 5 of the CC register. This bit is the half-carry bit, which none of the executive software rou- tines use. Thus, this fault will not affect the system’s safety. Therefore, the system successfully detected 100% of the ma- licious internal processor faults.

We also calculated the detection latency for each rou- tine-the time between the fault’s injection and the fault’s detection. Table 2 shows the minimum, maximum, and av- erage detection latencies for each of the routines. Figure 1 1 shows a distribution of the detection latencies for all the in- ternal processor faults detected by all the routines.

WITH OUR TECHNIQUE, we can perform fault injection ex- periments at the ISA level using a behavioral model of a sys- tem written in VHDL. Our modeling method allows for a significant reduction in simulation time as compared to a gate or device-level model. Because this allows more simulations in a reasonable amount of time, our technique improves re- sults by increasing the sample size. Since the model does not require a gate- or device-level description of the system, de- signers can use this evaluation via fault injection during the de- sign process rather than on a system prototype. At the same time, the technique can meet the increasing demands of fault- tolerant specifications by manipulating data at the ISA level (for instance, by executing real machine instructions).

Finally, we implemented our technique using the stan- dard types available with VHDL. So, this fault injection tech-

200 h

m c c

.- E L W n 8

z 100

c - 2 m

0

c > 0 c W 2

U

U

0 v

E

0 0 10 20

Time (ms)

Figure I 1. Distribution of detection latencies for faults in the processor.

nique does not require users to rewrite any of the implicit logical functions or existing code. In fact, we can readily ap- ply this fault injection technique to any existing VHDL func- tional model with minimal effort. For example, we added the fault injection ability to the MC6809 ISA model after the model’s validation; the addition took less than a day. Our application of the technique to an example ICs tested both the diagnostic capabilities of the system to estimate the sys- tem fault coverage and evaluated the feasibility of this ISA fault injection technique.

In the future, we will focus this research on two fault sirn- ulation topics. For the first, we will address the primaryshort- coming of the technique presented here, the problem of simulation speedup. Our technique is a serial fault simula- tion technique. For any significant number of fault simula- tions, the time required to perform them using a serial technique is usually too long to be of any real use in the de- sign process. Our goal is to devise a scheme to parallelize our technique to reduce the total time required for the sim- ulations. In addition, we are investigating ways to make the fault simulations execute more efficiently, and thus reduce the time required for a single fault simulation.

Our second focus area for future research will be the nature of faults and errorS at the behavioral level. We obtained the simulation results presented here by applying a gatelevel fault model (in this case, the stuck-at fault model) to a behavioral- level model. A more desirable approach would be to apply a fault/error model that is appropriate for a given level of design abstraction (such as the behavioral level).

Acknowledgments We thank Union Switch and Signal not only for funding this research

but also for providing a real-world ICs to model and evaluate.

32 IEEE DESIGN & TEST OF COMPUTERS

Page 10: A Fault Injection Technique for VHDL Behavioral-Level ...vargas/Disciplinas/HW-Reconfiguravel/FI... · 24 0740-7475/96/$05.00 0 1996 IEEE IEEE DESIGN 8t TEST OF COMPUTERS . ... allow:;

References 1. J. Karlsson, U. Gunneflo, and P. Liden, “Two Fault Injection

Techniques for Test of Fault Handling Mechanisms,” Proc. Int’l Test C o d , IEEE Computer Soc. Press, Los Alamitos, Calif., 1991,

2. J. McGough, D. Mulcare, and W.E. Larsen, “A Method of Mea- suring Fault Latency in a Digital Flight Control System,” Proc. IEEE/AI! 8th Digital Aoionics Systems C o d , IEEE, Piscataway,

3. Z. Segall et al., “FIAT-Fault Injection Based Automated Testing Environment,” Proc. 18th Int’l Symp. Fault-Tolerant Computing, IEEE CS Press, 1988, pp. 102-107.

4. J.G. McGough, F.L. Swern, and S. Bavuso, “New Results in Fault Latency Modeling,” Proc. IEEE Eascon C o d , IEEE, 1983, pp.

5. R.L. Baker and L.S. Mangum, “A Simulation-Based Fault Injec- tion Experiment to Evaluate Self-Test Diagnostics for a Fault- Tolerant Computer,” Proc. IEEE/AIAA Eighth Digital Avionics Systems C o d , IEEE, 1988, pp. 220-226.

pp. 140-149.

N.J., 1988, pp. 64-71.

299-306.

6. S.G Choi and R.K. Iyer, “FOCUS: An Experimental Environment for Fault Sensitivity Analysis,” IEEE Trans. Computers, Vol. 41, No. 12, Dec. 1992, pp. 1515-1526.

7. T.A. DeLong, “A Performance and Safety Analysis of a Micro- processor-Based Embedded Control System Using VHDL,” mas- ter‘s thesis, Dept. Electrical Engineering, Univ. of Virginia, 1994.

8. D.T. Smith, “A Malicious Fault List Generation Algorithm for the Evaluation of System Coverage,” doctoral thesis, Dept. Elec- trical Engineering, Univ. of Virginia, 1993.

Barry W. Johnson is a professor of electrical engineering at the University of Virginia and di- rector of its Center for Semicustom Integrated Systems, which he cofounded. He has both in- dustrial and academic experience in the de- sign and analysis of fault-tolerant and easily

testable systems. His specific areas of expertise include concur- rent error detection, fault modeling, reconfiguration techniques, reliability and safety analysis, multiprocessor architectures, and design for testability. Prior to joining academia, Johnson worked for the Government Aerospace Systems Division of Harris Corpo- ration, where he participated in research and development pro- jects on fault-lolerarit multiprocessor architectures for defense aerospace applications. He authored the textbook Design and Analysis of Fault Tolerant Digital Systems. Johnson received BS, ME, and PhD degrees, all in electrical engineering, from the Uni- versity of Virginia. He is a fellow of the IEEE and President-Elect of the Computer Society.

9 D T Smith et al., “A Method to Determine Equivalent Fault D - = to advance the state of the art in railway sig-

Joseph A. Profeta 111 is currently director of the Advanced Technology Group at Union Switch and Signal, Pittsburgh, Pa. The ATG is a group of approximately 25 researchers de- voted to the development of new technology

Classes for Permanent and Transient Faults,” Proc. Ann. Relia- bility and Maintainability Symp., IEEE, 1995, pp. 418-424.

IO. D.T. Smith et al., “A Fault-List Generation Algorithm for the Eval- uation of System Coverage,” Proc. Ann. Reliability and Main- tainability Symp., IEEE, 1995, pp. 425432.

naling and switching systems. His current research is in several ar- eas relating to railway systems, including modeling and simulation of train dynamics, wheel-to-rail interaction, speed regulation, sta- tion stopping, safety and reliability, Petri net theory, and VHDL modeling and simulation. Profeta received BS, MS, and PhD de- grees, all in electrical engineering, from the University of Pittsburgh.

Todd A. DeLong is currently a research sci- entist at the University of Virginia. Formerly, he worked as an operations research analyst for the Marathon Oil Company, where he designed and maintained software systems. His current research interests are fault-tolerant system de-

sign, analytical modeling of safety-critical systems, VHDL fault in- jection techniques, and the functional modeling of safety-critical systems with VHDL. DeLong received degrees in physics and com- puter science from Marietta College and the MS degree in electri- cal engineering from the University of Virginia. He is a member of the IEEE and the Computer Society.

Direct questions concerning this article to Todd A. DeLong, Dept. of Electrical Engineering, Thornton Hall, University of Vir- ginia, Charlottesville, VA 22903-2442; [email protected].

WINTER 1996 33