fault tolerant analysis of automotive systemsthe existing approach uses two fault analysis methods...

Fault Tolerant Analysis ofAutomotive Systems

Thesis submitted in partial fulfillmentof the requirements for the degree of

Bachelor of Technologyin

Computer Science and Engineeringby

Aurosish Mishra06CS3027

Under the supervision of:Prof. Partha Pratim Chakrabarti

Department of Computer Science and EngineeringIndian Institute of Technology, Kharagpur

West Bengal, India, 721302

May 10, 2010

CERTIFICATE

This is to certify that the thesis entitled “Fault Tolerant Analysis ofAutomotive Systems” is a bonafide record of authentic work carried outby Aurosish Mishra under my guidance and supervision for thefulfillment of the requirement for the award of the Bachelor of Technology(Honours) degree in Computer Science and Engineering at IndianInstitute of Technology, Kharagpur. The work incorporated in this, has notbeen, to the best of my knowldege, submitted to any other University orInstitute for the award of any degree or diploma.

Prof. Partha Pratim ChakrabartiDepartment of Computer Science and EngineeringIIT Kharagpur

1

ACKNOWLEDGEMENT

I would like to express my gratitude towards Prof. Partha Pratim Chakrabarti,Department of Computer Science and Engineering, IIT Kharagpur, for ex-tending all the necessary support throughout the duration of the projectand for being a constant source of inspiration. Taking time out of his busyschedule, he ensured that the project work was carried out methodically andmeticulously. I am greatly indebted to him for all his creative ideas and in-spirations which gave us an impetus to come out with constructive creations.

I would also like to express my gratitude to Mr. Satya Gautam Vadlamudifor his intellectual guidance, constant attention and unceasing encourage-ment during the research and development of this project.

I would also like to avail this opportunity to express my sincere gratitudeto my parents and my brother who have been an endless source of encour-agement for me.

Aurosish Mishra06CS3027Department of Computer Science and EngineeringIndian Institute of Technology KharagpurMay 10, 2010

2

Contents

1 Introduction 61.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 Summary of Work Done . . . . . . . . . . . . . . . . . . . . . 111.5 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . 11

2 Current Approach for Fault Tolerant Analysis 132.1 Operation Level Models . . . . . . . . . . . . . . . . . . . . . 132.2 Modelling Errors at Operational Level . . . . . . . . . . . . . 152.3 Simulation Based Analysis . . . . . . . . . . . . . . . . . . . . 18

2.3.1 Inputs for Fault Tolerant Analysis . . . . . . . . . . . . 182.3.2 Fault Simulation Based Analysis . . . . . . . . . . . . . 202.3.3 Complexity of Exhaustive Fault Simulation . . . . . . . 22

2.4 Look-Up Table Based Analysis of Operational Models . . . . . 232.4.1 Characterizing Operations . . . . . . . . . . . . . . . . 232.4.2 Analysis using LUTs . . . . . . . . . . . . . . . . . . . 25

2.5 Quality Fault Tolerance Analysis Flow . . . . . . . . . . . . . 27

3 Modified Approach to Fault Tolerant Analysis 293.1 Need for Modifications . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Existing Approach: Interesting Observations . . . . . . 293.2 Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Characterization of Components . . . . . . . . . . . . . . . . . 32

3.3.1 Alternate Representation of Residue . . . . . . . . . . 333.4 Compaction of LUTs . . . . . . . . . . . . . . . . . . . . . . . 353.5 Static Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 363.6 Dynamic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 39

3

4 Work Done 404.1 Fault Tolerant Fuel Controller . . . . . . . . . . . . . . . . . . 404.2 Characterization . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2.1 Study of LUTs . . . . . . . . . . . . . . . . . . . . . . 434.2.2 K-L Divergence using Alternate Representation . . . . 44

4.3 Behaviour of Certain Components . . . . . . . . . . . . . . . . 474.3.1 Adder Module . . . . . . . . . . . . . . . . . . . . . . . 474.3.2 Oxygen Correction Module . . . . . . . . . . . . . . . 51

4.4 Anti Lock Braking System . . . . . . . . . . . . . . . . . . . . 554.4.1 Slip fcn Mux unit . . . . . . . . . . . . . . . . . . . . . 56

4.5 Dynamic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 61

5 Conclusions and Action Plan 625.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2 Action Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4

Abstract

Over the years, Fault Tolerance has become the most sought-after require-ment in high-availability or life-critical systems. Fault-tolerant electronicsub-systems are becoming a standard requirement in the automotive indus-trial sector as electronics becomes pervasive in present cars. The existingfault-tolerance analysis approach can handle quality faults emanating fromloss of precision in signal values of automotive systems, in addition to logicalfaults. This analysis can aid the designer to identify where and how preci-sion losses occur in both software and hardware components. This analysisis performed on operation-level (Simulink) models, and hence provides earlydetection of weak spots in the system.

The existing approach uses two fault analysis methods in tandem: a fault-injection and simulation method is used which evaluates the quality degra-dation of output signals of the system, under specific testcases and fault-scenarios; and a look-up table based approach, in which quality degradationof the complete system is obtained by a series of table lookups, which storethe characterized quality degradation behavior of individual components. TheLUT based method performs a quick but approximate analysis and yields bet-ter coverage with respect to quality behavior.

The exhaustive analysis of quality fault-tolerance is a computationally in-tensive problem. So, we focus on improving the LUT based approach for per-forming fault tolerant analysis. In this thesis, we present a novel techniqueto perform characterization of components using multiple functional testcasesthat are quite varied and hence capture most of the component behaviour invarious faulty scenarios. Improving the characterization methodology, leadsto improved quality LUTs for each component, using which dynamic analy-sis produces much better results. We use our approach to characterize twosystems, namely a Fault Tolerant Fuel Controller and an Anti-lock BrakingSystem. Apart from this, we also suggest techniques for performing com-paction of LUTs to satisfy time and storage requirements.

5

Chapter 1

Introduction

Fault-tolerant electronic sub-systems are becoming a standard requirementin the automotive industrial sector as electronics becomes pervasive in presentcars. Fault-tolerance or graceful degradation is the property that enables asystem (often computer-based) to continue operating properly in the eventof the failure of (or one or more faults within) some of its components. If itsoperating quality decreases at all, the decrease is proportional to the severityof the failure, as compared to a navely-designed system in which even a smallfailure can cause total breakdown. Fault-tolerance is particularly sought-afterin high-availability or life-critical systems.

The basic characteristics of fault tolerance require the following:

1. No single point of repair

2. Fault isolation to the failing component

3. Fault containment to prevent propagation of the failure

4. Availability of reversion modes

In addition, fault tolerant systems are characterized in terms of bothplanned service outages and unplanned service outages. These are usuallymeasured at the application level and not just at a hardware level. Thefigure of merit is called availability and is expressed as a percentage. Forexample, a five nines system would statistically provide 99.999% availability.Fault-tolerant systems are typically based on the concept of redundancy.

6

1.1 Motivation

With the large scale proliferation of electronics and software as buildingblocks of automotive systems, fault- tolerance has emerged as a first classdesign requirement. It is of paramount importance to develop systems whichpreserve functionality in-spite of failures and errors in electronic, communi-cation and processing components. This requirement stems from the factthat unlike mechanical components, which have graceful degradation in caseof failures, failure of electronic components causes sharp changes in systembehavior. For example, a defective mechanical steering column may causesome variations in the steering torque provided by it, but a stuck-at-faultin a processor output providing signals to a steer-by-wire system may causedramatic increase or decrease in steering torque. Additionally, automotivesystems are safety-critical systems which must conform to stringent indus-try safety norms for fault-tolerance. Hence, it is important to design systemswhich are provably resilient to faults which may emanate from various sourcesand causes.

Failure of electrical and electronic components due to defects and agingis a well documented phenomenon [16]. Chips, sensors, power supplies, andelectromechanical actuators are known to fail either permanently, or for shorttime durations (transient failures), or by losing precision (behavior of measur-able quantities like output voltage varies from fault-free behavior). Addition-ally, ’bugs’ in software and even in the design of the hardware multiprocessorplatform, may cause transient and permanent failures. These failures man-ifest themselves as errors in the output of the control system, and then theactuator. These errors follow a feedback loop to the sensors of the automo-tive system by propagating across the mechanical components of the plant.This may further deviate the behavior of the automotive system from thefault-free case.

The analysis of precision loss of automotive signals (analog, digital anddata in software) is an important but insufficiently addressed issue in thedesign-flow of automotive control systems. Several automotive system com-ponents, like sensors, software and hardware blocks, introduce sporadic qual-ity faults ranging from shift in signal trajectory (for example bias faults insensors), to erroneous outputs for a short time duration (missing data from a

7

camera, or software bugs). These quality faults do not necessarily render theaffected signals useless, but cause a loss of precision. The precision loss intro-duced by one component propagates across other fault-free as well as faultycomponents resulting in aggregation of precision losses with each propaga-tion, and with the elapse of time. In order to tackle these problems, severalfault-tolerance (FT) mechanisms are employed by designers. An analysisframework, for systematically evaluating different fault-tolerant automotivesystem design options, for quality faults is an important ingredient of anyautomotive system design-flow. This research strives to address the qualityfault analysis problem for automotive systems.

1.2 Background

In order to design fault-tolerant systems, a gamut of techniques and designartifacts are incorporated in various components of the system. These designfeatures enable the functioning of the system in the presence of environmentinduced failures and aging failures. The most common technique employedacross software and hardware components of the automotive control system,as well as in sensors and actuators (with multiple windings), is replication[10]. A study of techniques for developing fault-tolerant processing-nodesusing replication in different ways is presented in [2]. Similarly, in somecases software components are replicated on multiple processing-nodes toensure that some replica of the software component is operational even incase of failure of a subset of processing-nodes [11]. Apart from replication,another technique for providing fault-tolerance in the software layer is em-bedding non-identical alternate ways for computing the value of a signal.Often look-up tables and time delay neural networks (TDNNs) [9] are usedto approximate the value of certain signals. These allow the computation ofthe same signal by multiple sets of software functions. Another techniqueproviding fault-tolerance to transient errors (like soft-errors [15] and statedependent software and hardware bugs) is rescheduling, wherein a task isre-executed if an error is detected [10].

Aside from fault-tolerant design techniques, two essential ingredients forachieving fault-tolerance are a diagnosis component and a reconfigurationor selection component. Diagnosis is required in order to detect the faultycomponents of the system and inform the reconfiguration mechanism of the

8

same. Most diagnosis methods operate by recording the values of certainsignals and checking values of these signals against pre-defined assertions orgoldenized models. Reconfiguration consists of changing the topology of thesoftware or hardware systems so as to overcome the failure of single or multi-ple components. In the simplest form reconfiguration involves switching thesource of a particular signal from one component to another. More complexreconfiguration methods which employ different algorithms in response tothe set of available resources (computing, sensory and actuators) [12] havealso been explored by researchers.

While designing a fault-tolerant automotive system, the designer may wantto perform a ’what if ?’ analysis for different fault-tolerance strategies. Ad-ditionally, once a final design point is reached, the designer must analyzethe design to assure that certain fault-tolerance requirements are satisfied.Clearly, a framework for analysis of fault-tolerance is an important require-ment for fault-tolerant automotive system design.

Fault-tolerance analysis methods in the literature can be broadly classifiedas per the fault-tolerance requirements which must be satisfied by the system:

1. Logical Analysis: The analysis of logical system failures [10], and re-silience under specific fault-scenarios (a sequence of occurrence of faults).Logical faults analyzed by these methods completely degrade the out-put of components, such that these output values cannot be utilized. Alogical fault naturally arises due to hardware failures (failure of powersupply, stuck-at-faults, aging) and bugs in the software.

2. Quality Analysis: The analysis of loss of precision in various compo-nents and the propagation of precision errors. There are several sourcesof quality faults, for example bugs in the software which are invoked ina few execution time-slots, resulting in erroneous outputs in only thoseexecution time-slots. Moreover, quality faults may arise in hardwaredue to shift in supply voltage to sensors, noise in sensors and actua-tor components, and missing output data from sensors (like cameras).While analyzing quality faults, the propagation of quality degradationacross fault-free as well as faulty, software, hardware and mechanicalcomponents must be studied. Additionally, it may be noted that insome cases diagnostic mechanisms monitor the quality degradation of

9

a subset of signals, and notify a warning when the tolerance limits arebreached. In such cases, it is often of interest to analyze the extentof quality degradation of un-monitored signals when certain diagnosticmechanisms do not issue a warning.

3. Reliability Analysis: The analysis of mean time to failure (MTTF) forthe system, given the MTTF of individual components [5]. This isespecially useful for analyzing aging faults.

Currently, various dynamic (simulation-based) and static analysis methodsare employed at various stages of the design-flow for fault-tolerance anal-ysis. Simulation-based methods are employed on operation-level, as wellas implementation-level models for logical and quality analysis. Typicaloperation-level models used are Simulink models [8], while C/C++ processorsimulators and emulation setups executing automotive software are employedfor implementation-level analysis. Model checking based methods may beemployed for logical analysis of designs, while graph-based algorithms areemployed to perform reliability analysis of automotive systems [5].

1.3 Objectives

The basic objective of the project is to:

Analyze the accuracy and coverage of quality-LUTs, and makenecessary improvements, with a view to developing an elegantmethod for performing fault tolerant analysis of Automotive Sys-tems.

The major part of the project focuses on developing a novel technique forcharacterizing components of an automotive demonstration,“Fault TolerantFuel Controller”, provided by Simulink. The existing characterization ap-proach leads to incomplete and somewhat inaccurate Look-up tables(LUTs).Ideally, we want our quality LUTs to be as accurate and as complete as pos-sible so as to capture almost all faulty scenarios. Using the improved LUTs,we aim at performing dynamic and static analysis in order to obtain thecounterexamples that violate the fault tolerance requirements. We also wantto perform compaction of the LUTs to reduce simulation time and storage

10

space. The proposed methodology is also tested on another model called the“Anti-lock Braking System“ so as to examine the advantages provided bythis method.

1.4 Summary of Work Done

We analyze an existing approach which uses lookup table-based and fault-simulation-based methods in tandem to obtain good coverage, while main-taining the accuracy of analysis. We try to rectify the shortcomings of theexisiting approach to come up with a generic fault tolerant analysis approach.

We present a modified technique for improving the characterization ofcomponents of Automotive Systems. This involves developing an alternaterepresentation to capture the behaviour of signals with minimal storage re-quirements. Improving the characterization methodology, leads to improvedquality LUTs for each component, using which dynamic analysis producesmuch better results. We elucidate the advantages of the new methodology, byapplying it to characterize two systems, provided by the Simulink automotivedemonstrations:

� A Fault Tolerant Fuel Controller

� An Anti-Lock Braking System

The varied behaviour of different components makes it a challenging prob-lem to characterize all components accurately and completely. The suggestedmethod keeps the various fault scenarios that a component is susceptible to,under consideration. The functional testcases chosen for characterizing com-ponents have a wide variation in behaviour so as to capture as much faultybehaviour as possible. Apart from this, we also suggest a technique forperforming compaction across the obtained LUTs so as to satisfy time andstorage constraints.

1.5 Organization of Thesis

The remainder of the thesis is organized into several chapters as follows:

11

� Chapter 2 presents the exisiting approach for performing fault toler-ant analysis, which involves using a lookup table-based and a fault-simulation-based method in tandem. This chapter introduces the basicconcepts of an operation level model, the used technique for character-ization of components and the advantages of the Look-Up table basedanalysis method.

� Chapter 3 presents the modified approach to characterize componentsof an automotive system, leading to generation of better quality LUTs.It also explains the needs for modifying the existing approach. Apartfrom this, we also present a technique for compaction of LUTs and theapproaches for performing static and dynamic analysis.

� Chapter 4 presents an account of all the work done, along with the ob-servations accompanying the new characterization method. We presentthe characterization of two components of the Fault Tolerant Fuel Con-troller and one component of the Anti Lock Braking system to estab-lish the efficacy of our method. We also present the improved dynamicanalysis results obtained using the modified quality LUTs.

� Chapter 5 outlines the conclusions of the project and prepares an actionplan to be followed in the near future.

12

Chapter 2

Current Approach for FaultTolerant Analysis

2.1 Operation Level Models

Operation-level modeling languages, for example the Simulink modelinglanguage [8], provide a versatile framework for modeling various aspects andabstractions of automotive systems. These models are capable of represent-ing not only function-level models of the automotive system, but also somedetails of the architectural platform on which the automotive system is exe-cuting (mapping to same processor, buffers and buses etc).

An operation-level model(Mop) [17] consists of operations(Op) which havea set of input(InOp) and output ports(OutOp), and signals(Sig) connectingdifferent ports of these operations. Each operation corresponds to a func-tional component of the system, which may range from a software code-blockto an analog component, or even a sensor. A discrete-event semantics is con-sidered in this work, which is more pertinent since several operations aremapped to software components (operating on sampled signals in discretesteps). Each signal denotes the value which is updated by the ’source’ opera-tion in every time-slot. It may be noted that a signal is a virtual connectionbetween operations and may correspond to physical quantities, like outputvoltage generated by a filter, or may correspond to a data value generatedby a software block.

13

Figure 2.1: A Schematic Representation of an Operation-Level Model

The schematic of an operation-level model can be represented as a directedgraph with operations corresponding to nodes and signals corresponding toedges (Sig ⊆ Op×Op). It broadly consists of five sets of operations, namely,sensor operations, control operations, actuator operations, plant operations,and environment operations (Figure 2.1). Additionally, we note that signalsconnect sensors to the control, control to the actuators, actuators to theplant,and plant to the sensors, in a cyclic manner. Environment operations,for example, the ’time- trigger’ and the ’user-input’ in Figure 2.1, modelthe behavior of the driver and the environment, and do not have incomingsignal edges. It is notable that removing signals between the plant and sen-sors (or actuators and the plant) causes the underlying graph structure tobe a directed acyclic graph (DAG). We call this modified operation-levelmodel, the open operation-level model, and the DAG structure is repre-sented as Mopen(Op, Sigopen), where Sigopen ⊂ Sig. Loops in the controlcomponent may be logically treated as being connected to a virtual actua-tor, an operation-free signal in the plant, and a virtual sensor (Figure 2.2).In this case, when the signals between the plant and sensors are removed,the graph structure of the operation-level model is a DAG.

14

Figure 2.2: Handling loops in the automotive control as a virtual signal paththrough the plant. (a) Original loop in the control, (b) Loop through avirtual sensor, and virtual actuator.

An example schematic representation of an operation-level model, broadlysimilar to schematic representations of several operation-level models of au-tomotive systems, is presented in Figure 2.1. Each operation in this modelcorresponds to either: a task of the control application, or a sensor opera-tion (obtains value of certain parameters and user inputs), or an actuator,or components of the plant (consisting of mechanical parts). Each operationis represented by either a logical or arithmetic function, or a state-machinelike a FSM , or hybrid I/O automata [13]. In most automotive systems ofinterest, the control system is almost entirely software based, thereby sen-sor outputs are immediately converted to data items provided as inputs tocontrol software components. Moreover, many control (software) componentsmay be time triggered, such that they start/resume execution at specific timeinstances. For example, the operation OP 4 executes only when 5 ms elapsesfrom the start of execution, even if inputs are available earlier. Edges be-tween operations denote virtual functional connection between them, whichmap the output trajectory of a source operation as the input trajectory forthe destination operation.

2.2 Modelling Errors at Operational Level

One of the most important requirements of an operation-level fault-simulationframework is the modeling of various quality faults at operation-level ab-straction. The origin of faults typically lie in circuit-level or assignment-level

15

details of an implementation. For example, a soft error causes a transientbit-flip in a memory cell or register, or a software bug may cause incorrectvalues. It is important to abstract these faults to the level of operation-levelmodels, while preserving the essence of the manner in which a fault affectsa signal value. In this work we propose to abstract the effects of qualityfaults as either noise, shift, or spike faults (Figure 2.3). These can be uti-lized to abstract the effects of the following types of faults (among others) inoperation-level models:

1. sensor faults leading to added noise on output and shift in output signaltrajectory (noise and shift faults).

2. missing data from sensors leading to arbitrary sensor outputs (spikes)in certain time-slots (for example missing data from a camera).

3. software bugs and hardware errors which do not manifest themselves inevery execution run can be viewed as spike faults in certain time-slotsin which they are exercised. These spike faults overapproximate the de-fect introduced by the bug/error by producing the maximum/minimumvalue possible for the signal in the said time-slot.

4. precision losses in software components are modeled as trajectory shifts.These shifts can occur due to typecasting bugs, and due to porting con-trol software to embedded automotive platforms which may not supportmany high-precision and floating-point operations.

5. hardware recovered soft-errors and temporary logical faults manifest asspikes (sudden and short change) on signals(spike fault).

It is assumed that permanent logical faults are detected by appropriatecomponents in the hardware layer such that fault-silence can be implemented[2]. Additionally, operations are assumed to be instrumented such thatthey are fault-silent, in case any essential input signal indicates fault-silence.Hence, propagating the effects of permanent logical faults by fault-simulationis trivial. Such being the case, we shall henceforth focus only on quality faultsin this work, while stating that the proposed methodologies are capable ofhandling logical faults as well.

16

Figure 2.3: Modelling of Errors at the Operation Level

It may be noted that some of the aforementioned faults do not originate insoftware control components of the automotive system. However, due to faultpropagation, their effects are still observed on the outputs of various softwarecontrol components (among others). Hence it is imperative for any fault-tolerance analysis method to address the propagation of the aforementionedfaults across different types of software, hardware, and mechanical (from theplant) components.

For modeling quality degradations, we associate each type of quality degra-dation with an appropriate measure for denoting the extent of quality degra-dation (the error). Measures of quality degradation for the three types ofquality faults introduced earlier are presented next:

17

1. Shift: in signal trajectory, for which quality degradation is denoted bythe maximum deviation in signal value among all time instances withinthe simulation window.

2. Noise: for which quality degradation is described by amplitude of theoverlaying additive random noise signal

3. Spikes: for which the quality degradation is denoted by the number ofspikes. The peak value of these spikes is bound by the upper limit ofthe operation-range or data-type (in case the signal is a digital data).

2.3 Simulation Based Analysis

2.3.1 Inputs for Fault Tolerant Analysis

There are three inputs to the simulation-based fault-tolerance analysisframework, namely the testcase, the fault-scenario and the fault-tolerancerequirement specifications.

Testcases typically describe a set of typical as well as critical maneuvers(job sequences) which are performed by the automotive system. Addition-ally, testcases may also be generated to perform directed analysis of theautomotive system. Usually, the sensor inputs which come from the user aremodeled by these testcases. In case only a part of the system is being tested,certain signals from the ’plant’, which were generated in response to controlsignals from some other sub-system, are also included in the test-suite.

The second input to the simulation-based fault-injection framework is thedescription of the fault-scenario under which the system must be analyzed.Fault-scenarios may be described explicitly by stating a set of faults whichmust occur. In the case of quality faults, in addition to the information ofwhich faults occur, the measure of quality degradation must also be stated.Hence a fault-scenario is a set of triplets of the form: <location, fault −type,measure> , where location denotes the signal which is afflicted by thefault, while fault-type and measure denote the type and measure of error(measure is irrelevant for logical faults). For example, a fault-scenario mayspecify that ’at most 5 spikes (type and measure) may be introduced by all(location) software components’.

18

It must be noted that all types of faults cannot occur on every signal inthe model. In this work we allow noise, shift and spike errors on sensoroutputs, shift and spike errors on outputs of software components, spikeerrors on outputs of hardware components, and no errors on outputs of plantcomponents (we argue that sensor errors can account for plant errors).

While specifying fault-scenarios as inputs to the analysis framework, it isimportant to account for the correlation between various faults. For examplea common power supply induces several correlated noise and bias (shift)faults in all sensors it supplies power to. In this work, we assume that fault-scenarios are external inputs to the proposed analysis framework, and aredefined by the designer by taking effects of correlation into consideration.

The third set of inputs to the simulation-based fault analysis framework isthe set of fault-tolerance requirements which must be satisfied by the system.Logical and timing properties specifying the design intent [3] of the systemcan be used as specification for the fault-tolerance analysis step. Addition-ally, specific properties for checking bounds on quality degradation may bespecified (for example an upper bound on amount of superimposed noise).Besides this, more involved properties may be written where acceptable qual-ity degradation is a function of time [4].

In this work, we consider quality fault-tolerance requirements to be prop-erties of the form

∧i ψi ⇒ φ, where ψi is an antecedent proposition and

φ is the consequent proposition. Both, the antecedent and the consequentpropositions, are either True, False, or are propositions denoted by tuplesof the form: <Sig, fault − type, limit>. Here each tuple specifies that thesum of quality degradations of type ’fault-type’, at all the signals location ∈Sig, must be upper-bounded by ’limit’. However, there is a crucial differencebetween the antecedent and consequent propositions. While the antecedentpropositions reason about the additive quality degradations (like in case offault-scenarios), the consequent proposition reasons about the quality degra-dation of the signal as compared to the golden (fault-free) signal.

19

Figure 2.4: A Schematic Representation of Fault Injection

2.3.2 Fault Simulation Based Analysis

Given the three inputs to the fault-tolerance analysis framework, the simulation-based fault-tolerance mechanism consists of fault-injection, simulation of theoperation-level model, and checking assertions corresponding to the fault-tolerance requirement properties. For this, first we instrument the nativeoperation-level model of the automotive system with ’fault-injection’ opera-tions. These operations superimpose errors on the native input signal (whichmay be error free or may itself contain some prior acquired errors) of suchoperations. Here it may be noted that all the fault types proposed in thiswork are amenable to signal superposition.

An example fault-injection operation, implemented in Simulink, is illus-trated in Figure 2.5. This operation can superimpose noise, shift, and spikeerrors (at most one spike) on the native signal. The amount of noise and shift,as well as the number of spikes (0 or 1 spike in this example), are fixed byassigning values to the parameters ’noise-amplitude’, ’shift-amplitude’ and’spikes0/1’. For example, ’shift-amplitude’ set to 0 indicates the absence ofa shift fault. The values which must be assigned to these parameters areobtained from elements of the fault-scenario, corresponding to the given sig-

20

Figure 2.5: A Fault Injection mechanism for introducing errors on a signal

nal. Other parameters, like the height of a spike and the position (along thetime-axis) of introduction of a spike are set, either randomly, or by the userafter a study of the signal waveform.

Once the operation-level model of the automotive system has been in-strumented with the ’fault-injection’ operations, fault-simulation consists ofchecking fault-tolerance requirements for all relevant fault-scenarios, for eachfunctional testcase. In each fault-simulation, monitored signals, which mustconform to fault-tolerance requirements, are compared against logged sig-nal trajectories of the corresponding fault-free run, for the given functionaltestcase. The difference in trajectory between the fault-injected and fault-free runs for each monitored signal is computed and logged. Thereafter,the logged difference trajectory is appropriately processed for checking shift,noise, and spikes causing deviation from the fault-free trajectory.

The difference between the fault-injected and fault-free (golden) signaltrajectories are computed as the difference in amplitudes of the signals indifferent time slots. Once the difference trajectory is computed, the numberof spikes is obtained as the number of time-slots in which the difference is

21

greater than a limit (spike-limit) which depends on the native trajectory. Forcomputing the amount of shift and noise errors, we consider windows of acertain number of time-slots starting at each time-slot within the simulationwindow. The amount of shift is then computed as the maximum, over allwindows, of the average value in each difference trajectory window-segment(while ignoring spikes). This essentially tries to mimic the behavior of acolumn filter. The amount of noise is computed as the maximum amplitudedifference within a window, amongst all windows.

Hence, we obtain a set of quality degradation tuples of the form:<msignal, vnoise, vshift, vspikes> , where vnoise , vshift , and vspikes are the qual-ity degradations of different types (noise, shift, and spikes) for the monitoredsignal, msignal. Fault-tolerance requirements are checked by comparing thequality degradation tuples of the monitored signals, with their limits in thefault-tolerance requirement queries.

2.3.3 Complexity of Exhaustive Fault Simulation

An important aspect of the quality fault analysis framework is the notionof a complete set of fault-scenarios for the automotive system (for a givenfunctional testcase). Since the measurement of quality degradation for noiseand shift faults involves real numbers (with infinitely many possible valueswithin a bounded interval), a finite representative set of real numbers may beselected by the designer for defining a complete set of fault-scenarios. Suchbeing the case, let Πnoise(l) and Πshift(l) be the finite sets of values of degra-dation for the additional noise and shift (respectively) introduced on signall. Now consider a system where Ss, Sc , and Sh are sets of sensor, software,and hardware output signals (with ns = |Ss|, nc = |Sc|, andnh = |Sh|). Ad-ditionally, let us assume that all queries have an antecedent proposition ofthe form <Ss ∪ Sc ∪ Sh, spike, nspikes>, restricting the total number of in-jected spikes to at most nspikes. Clearly, there are

∑i=0...nspikes

Cins+nc+nh+i−1

fault-scenarios arising out of different ways in which spikes may be injected(selecting i signals from ns + nc + nh with repetition). Additionally, assum-ing that |Πnoise(l)| = |Πshift(l)| = nquant, the number of possible ways ofoccurrence of shift and noise faults at sensor outputs is n2ns

quant. Also, thenumber of possible ways of occurrence of shift faults at software componentoutputs is: nquant

nc . Hence, the total number of fault-scenarios possible foreach testcase is:(

∑i=0...nspikes

Cins+nc+nh+i−1)nquant

2ns+nc . For the automotive

22

system case study introduced later,ns = 4, nc = 20 and nh = 3. The numberof fault-scenarios for nquant = 3 and nspike = 5, is approximately 4.6× 1019.

2.4 Look-Up Table Based Analysis of Opera-

tional Models

The analysis of quality degradation of automotive systems is a computa-tionally intensive problem, often requiring a very large number of simulationruns, as illustrated in Section 2.3. Hence, it is paramount to devise meth-ods to reduce the amount of computation required for each simulation run.In this work we characterize the quality response of components, and thenutilize this result to perform the analysis for the complete system. Char-acterization is conducted on individual operations, or a user defined set ofoperations, to analyze their response to input signal quality degradation, aswell as spikes on inputs. In the proposed lookup table-based method, theresults of characterization are tabulated, and quality degradation of a systemis computed by a series of table lookups.

2.4.1 Characterizing Operations

Let us consider an operation OP , which has k input signals i0, i1, ..., ik−1

and m output signals o0, o1, ..., om−1. The first step for characterization in-volves logging the input and output signal trajectories for fault-free behavior,for each testcase tci ∈ TC. For the illustrated setup (in Simulink) for charac-terization shown in Figure 2.6, ’engine speed.mat’ and ’manifold pressure.mat’are log files for the fault-free input signals, while ’pumping constant output.mat’is the log file of the output signal, for one testcase.

Characterization proceeds exactly like fault-injection and simulation as de-scribed earlier in Section 2.2. The fault-scenarios for characterization involveinvoking noise, shift and spike faults on the input signals. Even though sev-eral types of errors are not injected on many input signals during full-systemfault-simulation, characterization requires that all types of errors be injectedon each input signal. This is because errors injected at input signals mustnot only model errors introduced by the immediate source operations, butalso model propagated errors which originate beyond the source of the signal,from operations which could introduce any type of error (like sensors).

23

Figure 2.6: A setup for characterizing a two-input component

For an exhaustive characterization, all combinations of values of noise andshift faults at the user defined quantization-level (nquant points each), andspike faults with an upper bound on the number of spikes (nspikes), are

invoked. Hence (n2quant)

k, fault-scenarios iterating over nquant noise, and

nquant shift values for each of the k inputs must be considered. Additionally,0...nspike spikes are spread across the k inputs, contributing to a factor of∑

i=0...nspikesCi

k+i−1. Hence the total number of fault-scenarios for exhaus-

tive characterization is n2kquant(

∑i=0...nspikes

Cik+i−1). It is notable that the

number of fault-scenarios is not as large as that for full-functional fault-simulation. For nquant = 3 and nspikes = 5 (as earlier in Section 2.3), thenumber of scenarios for a two input operation is 1701. In our studies wehave found that two-input operations are the most commonly encounteredoperations in the automotive domain when the system is partitioned at areasonable granularity (20-30 operations).

The results of characterization are tabulated in a quality lookup tablemapping the quality degradation on the inputs, to the quality degradation ofthe outputs. Hence it stores a mapping LUT : Π(i0)× Π(i1)...× Π(ik−1)→χ(o0) × χ(o1)... × χ(om−1). Here Π(i) ⊆ Πnoise(i) × Πshift(i) × Πspike(i),and Πnoise(i),Πshift(i),Πspike(i) are the sets of noise, shift and spike valuesfor signal i which are invoked during characterization. The values χ(i) ⊂R+ × R × Z+, denote the values of output noise (positive real), shift (real)and spikes (natural number) for signal i.

24

2.4.2 Analysis using LUTs

Given tuples of quality degradations for all input signals of an operation op,the quality degradation tuples for the output signals can be approximatelyestimated from the quality lookup table for the operation Op. We use thenotation V (il) = <vnoise

in(il), vshiftin(il), vspikes

in(il)>, to denote the qualitydegradation tuple for input il, and V (oj) = <vnoise

out(oj), vshiftout(oj), vspikes

out(oj)>,to denote the tuple of quality degradation for output oj. If the input degra-dations do not coincide with quality values from characterization, i.e. V (i0)×V (i1)... × V (ik−1) /∈ Π(i0) × Π(i1)... × Π(ik−1), then one of several interpo-lation/extrapolation techniques may be used to estimate the output qualitydegradations. In this case, the output quality degradations are approxi-mations of the actual quality degradation values. It may be noted thatthe quality lookup table approximately captures the propagation of qualitydegradations across an operation, arising from the execution semantics ofthat operation. Additionally, the operation may itself add to the qualitydegradation of the signal (due to errors in its own execution), which is mani-fested as injected faults on the output signals of the operation. Hence, besidesthe lookup table an ’addition’ operation is needed to model the quality re-sponse of the operation. This ’lookup table-addition’ pair for each operationcorrespond to the ’operation - fault-injector’ pair from the fault-simulationsetup described earlier in Section 2.2. There is exactly one lookup table (andaddition operations) for each operation in the native model.

It is notable that the lookup table-based analysis is a quality centric anal-ysis. It does not reason about the trajectory of signals, but solely reasonsabout the quality degradations of signals. Hence it has obvious gains interms of computational effort needed for analysis. Moreover, the lookuptable-based method performs exhaustive local analysis at each operation inthe characterization step. This helps the lookup table approach to attainhigher coverage for quality-centric analysis.

A lookup table-based estimation of quality degradation is performed bysuccessive table-lookup and addition operations, on a network of lookup ta-bles and addition operations. Fault-scenarios being analyzed are used to addto the quality degradation, just like fault-injection was performed for fault-simulation. For example, consider the partial network of lookup tables andaddition operation in Figure 2.7.

25

Figure 2.7: Performing Quality-Centric Analysis using LUTs

We illustrate the lookup table analysis for the fault-scenario:<SIGNAL12, 0.2, 0.2, 1>, indicating that the signal ’SIGNAL12’ undergoesan additional quality degradation of 0.2 units for noise, 0.2 units for shift, and1 additional spike. The initial quality degradations for signals ’INPUT1’ and’INPUT2 are <0.3,−0.1, 1> and <1.4, 0.0, 2> respectively. This is closestto the lookup table entry ’<0.4,−0.3, 1> , <1.4,−0.3, 3>’, as circled in thefigure. Hence the propagated value of degradation across the operation iscomputed to be <1.0, 0.2, 1>. Additional quality degradation is added asper the fault-scenario by using the addition operation. This results in thequality degradation at ′SIGNAL−12′ to be <1.2, 0.4, 2>. A lookup in table′QualityLUT2′, for another operation OP2, yields the quality degradation ofthe output signal of OP2 to be <1.0,−0.2, 0>.

In most designs, like the one presented in Section 2, there is a feedbackloop from the outputs of the plant to the inputs of the sensors. This feedbackloop is removed for the quality centric analysis, since quality degradations aredefined over the complete simulation window (time duration for which simu-lation is performed) and a lookup table-based analysis without any feedbackcovers analysis for the simulation window. However, removing the feedbackresults in the lookup table-based analysis not being able to accurately cap-ture the effects of feedback errors. Additionally, we note that the lookuptable-based approach is oblivious to the variations of errors over time, andinstead reasons only about the maximum values of errors. In this setup,feedback errors can be modeled by additional sensor errors. However, ourexperiments indicate that even without feedback error modeling at sensors,the lookup table-based approach tends to overapproximate errors.

26

Figure 2.8: Quality Fault Tolerance Analysis Flow

2.5 Quality Fault Tolerance Analysis Flow

In this work we propose an analysis-flow which appropriately melds thelookup table andfault-simulation-based methodologies to leverage both quicksimulation and high coverage of the lookup table approach, while maintainingthe accuracy of the fault-simulation method. The analysis flow is as follows:

The first step in the analysis-flow is a functional simulation of the operation-level model, in order to obtain golden trajectories at various signals, for eachfunctional testcase. Thereafter, the user partitions the operation-level modelinto operation blocks, and these blocks are characterized. These character-

27

izing simulations involve fault-injection and simulation of individual blocks,comparing the outputs against stored golden trajectories, and storing sig-nal quality degradation in lookup tables. These lookup tables consist of theinput versus output quality of signals for each functional testcase. Oncethe lookup tables have been obtained, a network of lookup tables and ad-dition operations is constructed, and fault-simulation of the complete sys-tem is performed by a series of table lookups. Fault-tolerance requirementsare checked on the results obtained by simulation of the lookup table net-work. If a fault-tolerance requirement is violated, then the counterexamplefault-scenario which violates the fault-tolerance requirement is re-examinedin the operation-level model of the system by fault-simulation. Hence, thebulk of the fault-tolerance analysis simulations are performed by the lookuptable-based approach, while only the counterexample fault-scenarios are ex-amined by the simulation of the operation-level model. Additionally, if thereis a significant difference in the values of quality degradations obtained byoperation-level fault-simulation and the lookup table-based approaches, thenthe lookup tables are augmented with quality degradation inputs and out-puts at individual operation blocks, as obtained during the fault-simulation.Here, if I(i0)...I(ik−1) are the quality tuples for the input signals of an op-eration Op, as obtained during fault-simulation, and O(o0)...O(om−1) arethe quality tuples for the output signals, then a new lookup table entryI(i0), I(i1), ...I(ik−1), O(o0), O(o1), ...O(om−1) is added for the operation Op.

28

Chapter 3

Modified Approach to FaultTolerant Analysis

3.1 Need for Modifications

The existing method has several salient features, among them are theability to handle quality faults, and performing analysis at the operation-level abstraction. Also, it has sufficient coverage and a good deal of ac-curacy. However, there are several shortcomings of this approach as well.We try to propose certain modifications in the approach, while still keepingthe operation-level simulation method and the lookup table-based analysismethod as the basic framework of our methodology.

3.1.1 Existing Approach: Interesting Observations

The existing approach has several adavantages as well as several limita-tions which are worth analyzing. Some interesting observations are as follows:

1. Representation of Signals.

The existing approach uses a combination of noise, shift and spike tocapture the behaviour of residue signals. Ideally, this approach tendsto have several difficulties. If the spike limit is quite low, then it iseasy to confuse between noise and spikes. Also, if the entire signal isshifted by an amount greater than the spike limit in some contiguous

29

time slots(increasing function), then it would be perceived as multiplespikes in the range, whereas actually it should be considered as a shiftfault. Fortunately for the system being analysed, the spike limits aremuch larger in magnitude than the maximum noise, and almost nosignals have the output residue as a monotonically increasing functionof time. Whatever residues have such behaviour, they are very smallin amplitude and hence can be captured by a simple noise term. Thus,this representation is sufficiently accurate for our analysis.

2. Granularity.

Granularity refers to the user-defined quantization levels for noise,shift and spike faults. This indicates the various magnitudes of noiseand shift faults that can be added to the golden input signals. Mostof the analysis of the existing approach was performed using a granu-larity of 3 for noise and shift faults. The basic assumption of havinga certain granularity is that any value taken from a range, other thanthe representative value of that range, is assumed to yield the sameoutput residue as done by the representative value. This, of course,leads to a certain amount of inaccuracy in the analysis of components.Nonetheless, having higher granularity increases the simulation time aswell as the space required to store the LUTs by a large amount. So,there is a distinct trade-off between simulation time and storage spacevs the accuracy of simulation.

3. Functional Testcase.

The functional testcase for a component consists of a set of goldeninput signals and its corresponding golden output signal based on whichthe fault injection is performed. The existing approach has only onefunctional testcase for each component, on which perturbations areadded and characterization is performed. Clearly, this may not coverall the possible faulty input scenarios. So, the LUTs generated byusing just a single functional testcase per component, are somewhatincomplete and the component is thus not completely characterized.We would be much better off, if we had multiple functional test cases,to form the basis for our characterization.

30

3.2 Proposed Approach

After careful study of the shortcomings of the existing approach, a modifiedmethodology was proposed for performing Fault Tolerant Analysis of Auto-motive Systems. The entire analysis was done on the Fault Tolerant FuelController Automotive demonstration provided by Simulink.The procedurein brief is as follows:

1. Characterization of Components. Each component of the automotivesystem is characterized using multiple functional test-cases. This en-sures that a wider variety of faulty scenarios are considered, leading tomore complete LUTs. The granularity of analysis is also changed asdeeemed suitable to obtain better quality LUTs.

2. Alternate Representation of Residue. Although, the existing classifica-tion of signals based on noise, shift and spike faults was quite effective,an alternate representation has been suggested to facilitate comparisionbetween output residues.

3. Gray Box Approach. The existing approach analyzed each componentas a complete black box, using only the input-output residue pairs tocharacterize its behaviour. But, it is observed that any knowledge ofthe actual behaviour of the component, i.e. a white box analysis, canhelp in characterizing the components better. This combination ofwhite box and black box is called as a gray box for every component.So, we try to generate quality LUTs using the gray box approach andstudy the analysis performed using such LUTs.

4. Compaction of LUTs. The more detailed characterization means thatthe size of the LUTs generated are quite huge and hence, compactionbecomes more important than before. An algorithm has been devel-oped to perform compaction of LUTs: both inside an LUT as well asacross various LUTs obtained for different functional testcases for acomponent.

5. Analysis Using the LUTs. Using the improved quality LUTs generatedfor each component, a system level static as well as dynamic analysis isperformed, to obtain the number of counter-examples that violate thefault tolerant requirements for the system.

31

6. Other Models.The proposed methodology is used to analyze other au-tomotive systems such as an ’Anti-Lock Braking System’ with a viewto finding problems,if any, that are inherent to the analysis.

3.3 Characterization of Components

In order to perform any fault tolerant analysis on an automotive system,it is of utmost importance that each component should be characterized asaccurately and as completely as possible. Characterization proceeds exactlylike fault-injection and simulation as in the previous approach. The fault-scenarios for characterization are generated by adding noise, shift and spikefaults on the golden input signals. Even though several types of errors arenot injected on many input signals during full-system fault-simulation, char-acterization requires that all types of errors be injected on each input signal.This is because errors injected at input signals must not only model errorsintroduced by the immediate source operations, but also model propagatederrors which originate beyond the source of the signal, from operations whichcould introduce any type of error (like sensors). The following steps were fol-lowed to characterize the components of the system at hand.

1. Multiple Functional Testcases. For each component, we used 5 func-tional testcases to serve as the golden input-output sets. These wereobtained from the golden inputs to these components used in otherSimulink models. If those were not available, then random functionaltestcases(with random spikes and noise) were generated. It was ensuredthat all the functional testcases had wide variation, so as to cover asmuch behaviour of the component as possible.

2. Axes Values and Other Thresholds. Previously, since only one func-tional testcase was used, the representative values for the noise andshift axes, and the spike limits, were set by observing the behaviour ofthat single golden input. In the modified approach, we use the globalmaximum and minimum among all the signals to set the noise and shiftaxes values, and to fix the spike upper and lower limits. This ensuresthat the magnitude of fault added for a given set of noise, shift andspike values is the same irrespective of which functional testcase theperturbation is added on. This makes the input and output residuesamenable to comparision.

32

3. Modifying Granularity. Most of the analysis is performed using a gran-ularity of 3, but we do modify the granularity to assist in distinguish-ing between components. For a component that shows similar outputresidues to within a specified error limit, for same input residue addedacross different functional testcases, we do not tinker with the graular-ity and simply use the LUTs generated at a granularity of 3. But,if a component shows different behaviour, for the same input residue,added to different functional testcases, then we look into changing thegranularity to distinguish the components further.

In such a case, we first reduce the granularity from 3 to 2 and generatethe LUTs. If a component still shows different behaviour for differentfunctional testcases, then we cannot combine them in any way, andso, we use the LUTs generated with granularity 3, as they are moreaccurate. However, if any component generates almost similar LUTs atthis reduced level of granularity, then we increase the granularity from 2to 4 and generate the LUTs. This ensures that we capture more detailedbehaviour of such components that showed different behaviour to startoff, i.e. we obtain more accurate quality LUTs for such components.

Using this technique more complete LUTs were obtained for each compo-nent. Obviously, the size of LUTs increased a lot, so suitable compactiontechniques were developed.

3.3.1 Alternate Representation of Residue

An alternate representation was developed to perform comparision be-tween the output residues obtained from different functional testcases. Thenew chosen representation did not involve classifying the signal in terms ofnoise,shift and spike, but to store the nature of the signal( its approximatetrajectory) using only a small number of data points. Basically, we tried tocapture the behaviour of the signal by sampling at specific points in varioustime slots. We could perform the comparision better by storing the entiresignal, but due to space constraints, we resort to the following algorithm.

Algorithm.

33

1. Time-slots. Divide the entire signal into several intervals whose widthis user-specified. The width is so chosen that it captures almost theentire signal, keeping in mind the memory available for storing eachsignal. A width of 100 was found suitable enough as per our memoryconstraints.

2. Encoding Scheme. In each interval, find the maximum and minimumamplitudes of the signal. Store them as (Max-value, Min-value) or(Min-value, Max-value) pairs depending on their order of occurrencein each interval. So, for every interval we store two signal points. Forthe entire signal store the first and last data points just for the sake ofsimplification.

3. Decoding Scheme. Reconstruct the original signal by plotting thesepairs of values at quarter and three-quarters of the interval, in theorder in which they were stored. Plot the first and last data points ofthe encoded signal at their original positions.

In Matlab, each signal is stored using 3001 data points. Using an intervalwidth of 100, we have 30 intervals. For each interval, we store 2 data points,making a total of 60 data points. Taking into account the first and last points,we have encoded a signal using 62 data points. For decoding the signal, itwas found that plotting the points at a distance of half the interval lengthprovided the most accurate representation. Since, we no longer representedthe signal in terms of noise, shift and spikes, there was no chance of wronginterpretation during comparision.

K-L Divergence.

The Kullback-Leibler divergence, more popularly known as the K-L Diver-gence is a non-symmetric measure of the difference between two distributionsP and Q. This was used to analyze how much different,the output residuesof two different functional testcases, were, for the same set of noise,shiftand spike input faults. The discrete version of the Divergence is given by :

DKL(P,Q) =∑

i P (i) logP (i)

Q(i).

34

This usually is used to compare two probability distributions. So, to applythis to any ordinary signal, an appropriate normalization constant was used.The normalization constant for each signal was the sum of the amplitudes atthe encoding points of the alternate representation. Using those 62 pointsfor each signal, K-L Divergence was computed for 5 randomly chosen pairs ofoutput residues. These were averaged out to give the K-L Divergence valuefor that component. These K-L Divergence values provided a technique forcategorizing the various components of the automotive system.

3.4 Compaction of LUTs

The LUTs as seen from the calculations above can contain large numberof entries if the component is susceptible to all the three types of qualityfaults. Storing such large LUTs may be a problem. So, one of our aims isto compact the LUTs. The compaction of LUTs can be performed insideevery LUT individually, or across LUTs for different functional testcases fora component. While compacting, we basically look for entries that can bemerged by virtue of their similar nature, yielding a smaller LUT.

Across a single functional testcase. For each functional testcase, weadd perturbations in the form of noise, shift and spike faults and createvarious faulty input signals (say m in number). Now for any pair of the minput signals, if the output residues are same within a specified error limit,then we can combine the rows for these two signals to obtain a single row inthe LUT. For performing such compaction, we use a technique very similarto circuit packing methods, in which we try to form ’solid groups’ comprisingblocks of LUT portions that have similar values. The amount of similarity,needed for being a member of a solid group, depends on the specified errorlimit δ. For those solid groups, we combine the rows of LUTs, to generatea range of noise, shift or spike axes values that have almost similar LUTentries. Thus, we can compact across a single functional testcase. Thesecompacted LUTs are in the format necessary to perform static analysis andcan improve its efficieny to a large extent.

Across multiple functional testcases. For each of the n functional test-cases, we obtain m signals as before. Choose pairs of signals across the test-cases. Depending on how much fault tolerant the system is, some or most

35

of such pairs will have similar output residues within a certain error limit.For components that generate almost similar output residues to within aspecified error limit, for same input residue added across different functionaltestcases, we average out the LUT entries for all such pairs to obtain only asingle quality LUT for such components. Thus, we have compacted acrossfunctional testcases.

The need for compaction increases in the proposed approach, as for certaincomponents, we generate the LUTs using a higher granularity of 4. Thismeans that the tables will be of larger size. Also, we are characterizingthe components based on multiple functional testcases, meaning that wewill have more number of LUTs for each component. Hence, compaction ofLUTs, both inside an individual LUT and across LUTs for a component isvery much an essential step in our analysis.

3.5 Static Analysis

It has been observed that logical faults can be effectively analyzed usinglogic-reasoning methods, and methods like fault-tree analysis. However, thesame is not true for quality faults. The analysis of quality faults is usuallydone by fault-injection and simulation, which is incapable of providing suffi-cient coverage. As an example, consider a system which has 20 components,each of which can introduce some bounded quality degradation. In this case,even if we were to study a simplistic quality-fault model where a qualityfault with a certain amplitude may either occur or not occur, we have to testabout one million testcases. The alternative to testing a very large numberof testcases is to formulate and solve the quality fault-tolerance problem asa static analysis, or an optimization problem as done in some robustnessanalysis methods.

In this project, we investigated a static analysis method for performingquality fault-tolerance analysis. The advantage of static analysis over simu-lation, is that it can store and use partial simulation-run results to build-upcomplete simulation-runs, without having to redo simulations from the start.Moreover, if any partial-result of one simulation run matches with that of apreviously investigated run, then this run is not explored any further (statematching). Additionally, recent developments in static analysis techniques,

36

notably SAT and SMT (Satisfiability Modulo Theory) solving, allows theuse of effective search heuristics to address large problems. However, the ap-plicability of static methods is usually limited by available computer RAMmemory. In particular, the memory requirements for analyzing operation-level models, like C-code or Simulink/Stateflow models, is such that scalabil-ity to handle large programs remains a challenge. Such being the case, it isimportant to appropriately abstract the software control system behavior, soas to facilitate the use of static analysis, while retaining traceability betweenresults identified by this analysis and the actual system behavior.

We propose a static-analysis-based framework for quality fault-toleranceanalysis of operation-level models like Matlab/Simulink. Due to the trendtowards designing automotive software in operation-level languages like Mat-lab/Simulink, and with the intention of identifying issues early in the design-flow, it is important to perform this analysis at the operation-level. Theinputs to the analysis framework are: a functional specification of the fault-hypothesis and quality fault-tolerance requirements, in a language proposedin this work; the set of functional testcases; and the operation-level model ofthe application. The following algorithm is used to perform static analysisfor automotive systems with quality faults.

Algorithm.

1. Partitioning the complete operation-level system model into sub-systemscomponents.

2. This is a functional testing step, which creates golden simulation tracesfor the quality fault-tolerance analysis, for a suite of functional test-cases. The input and output traces for each sub-system component,for each functional-testcase, is recorded in this step.

3. The characterization step wherein for each sub-system component, andeach functional-testcase, several characterizing simulation runs are per-formed. Each characterization run consists of perturbing inputs (corre-sponding to a functional testcase) in a systematic manner and recordingthe output perturbations. The output of this step is a set of lookup ta-bles (LUTs), one for each sub-system component, and each functionaltestcase. The characterization step is where the abstraction occurs.

37

Figure 3.1: Static Analysis : An Overview

Here the detailed dynamics of the system are hidden, and abstractedas quality lookup tables mapping input quality degradation to outputquality degradation of individual components.

4. A SMT instance is created by replacing each sub-system operation bya LUT and an addition operation (to model propagation of errors, andintroduction of errors by the component respectively), and adding con-straints to bind input and output values of sub-system componentsappropriately. The variables modeling injected faults, and those mod-eling the error at the outputs are constrained in accordance with thefault-hypothesis and fault-tolerance requirements respectively.

5. Thereafter we use a SMT solver to obtain a fault-scenario which violatesthe fault-tolerance requirements. This counterexample fault-scenarioindicates a hot-spot in the design (abstraction in the characterizationstep implies that the results of static analysis are approximate).

6. Finally, this counterexample fault-scenario is reproduced in the operation-level model, and checked if it is spurious or an actual fault-scenario.

38

Spurious counterexamples may also be used for improving the accuracyof characterization.

It is notable that one may also consider the aforesaid approach to be a di-rected fault-simulation, where the results of the SMT solver provide directedtestcases to be simulated in the operation-level model.

3.6 Dynamic Analysis

Any analysis that is simulation-based comes under the category of dynamicanalysis. These simulation-based methods are employed on operation-level,as well as implementation-level models for logical and quality analysis. Typ-ical operation-level models used are Simulink models [8], while C/C++ pro-cessor simulators, and emulation setups executing automotive software, areemployed for implementation-level analysis.

The entire fault simulation based method used in the existing approachis based on dynamic analysis. In a simple sense, dynamic analysis meansanalyzing a system using LUTs. Basically, we construct the system as anetwork of LUTs, in which components are connected to each other. Weperform simulation on the entire system using the LUT values and analyzethe results. The dynamic analyis gives us an idea about the number ofcounterexamples produced in a set of simulation runs at the operation level,that violate the given fault tolerance requirements. These counterexamplesare then run at the system level to verify if they actually violate the systemrequirements.

39

Chapter 4

Work Done

The entire fault tolerant analysis was done on the ’Fault-tolerant fuel con-troller’ (ft fuelsys model) automotive system from the Simulink automotivedemonstrations. First, a brief overview of the selected automotive model ispresented, followed by the observed experimental results.

4.1 Fault Tolerant Fuel Controller

The ’Fault-tolerant fuel controller’ model is an engine-block application,which computes the fuel flow rate for the engine, dependent on various inputs.The Simulink model of the ’Fault-tolerant fuel controller’ example consistsof clearly demarcated control (Fuel-Rate-Controller) and plant (Engine-Gas-Dynamics) sub-systems. There are two external inputs to the model, namelythe throttle and the engine speed. Additionally, the controller has inputswhich indicate the oxygen flow (EGO) and the manifold pressure (MAP),connected via a feedback loop from the plant. All controller inputs are ob-tained from appropriate sensors. The studied output of the system (themonitored signal) is the fuel-flow rate calculated by the control.

We perform an analysis to study the effects of sensor and software com-ponent failures on the output (fuel-flow rate) of the control. Sensors sufferfrom noise, shift, and spike errors, while software components suffer shiftand spike errors (only spike errors in case of boolean output), and hardwarecomponents suffer from spike errors. A spike error in the sensor is treated asa temporary logical error by the system and the fault-resilience framework

40

Figure 4.1: A Schematic Operation-Level Model of “The Fault Tolerant FuelController“

(the ’sensor fault correction’ block in Figure 4.2) inbuilt into the system triesto correct the errors. However, the recovery usually leads to shift errors onoutputs, in lieu of spike errors.

The Simulink model of the ft fuelsys system consists of two main compo-nents, namely the controller (Fuel-Rate- Controller) and the plant (Engine-Gas-Dynamics). The control, consists of four main components, as shownin Figure 4.2. These are the control logic, the sensor fault correction unit,the air-flow calculation unit, and the fuel calculation unit. The Fuel-Rate-Controller is assumed to be implemented in software, and a fixed-step solver(with a time-step of 0.01 s) is employed for simulations. We identify 20 op-erations in the control component (Fuel-Rate- Controller) of the ft fuelsyssystem. Among these components, 15 have two floating-point inputs, 4 haveone floating-point and one boolean input, while 1 has one-floating point in-put.

The fault-tolerance requirement studied for this automotive system is:Antecedent:’The noise injected at sensor outputs is less than 10 of the maxi-mum value of thesignaltrajectory, and the shift injected at sensor outputs is

41

Figure 4.2: The actual controller module

between -5 to 5 of the maximum value of the signal trajectory, and the shiftinjected at software component outputs is less than 1 of the maximum valueof the signal trajectory, and the total number of spikes injected is less thanor equal to 5.’Consequent:’The noise and shift at the fuel-flow rate signal is less than 5 ofmaximum value of the signal trajectory.’

A random fault-scenario generator for generating fault-scenarios consistentwith the antecedent of the fault-tolerance requirements. We utilize thesefault-scenarios in the developed fault-simulation framework for quality fault-tolerance analysis. A framework for exhaustive characterization has alsobeen developed. This characterization framework tests individual operationsfor all fault-scenarios(using multiple functional testcases) at a user definedgranularity for shift and noise faults (number of points between 0-10 and -5to 5 fault-levels respectively), and up to 5 (maximum number allowed bythe antecedent) spikes on the input signals. We develop a network of lookuptables and addition operations for fault-simulation using lookup tables.

42

4.2 Characterization

4.2.1 Study of LUTs

The components of the fault tolerant fuel controller were characterized usingmultiple functional testcases as mentioned before. Based on the results ofcomparision of the output residues, the components could be divided intotwo broad categories:

Type 1. For these components, the nature of the output residues for sim-ilar input residue, added to various functional testcases, was same within asmall error limit. So, the LUT entries had almost similar values for perturba-tions added to different golden input sets. Also, these had low K-L divergencevalues for various output residues. This means that our set of functional test-cases was sufficient to almost completely characterize such components. Theadder, multiplier, TAM MAP, TAM MAR were the two-input componentsthat showed this type of behavior.

Type 2. For these components, the nature of the output residues for sim-ilar input residue, added to various functional testcases, was quite varied.The LUT entries had widely different values for different functional test-cases, indicating that the output residue was somehow dependent on thegolden input values. A white box approach might give useful insights re-garding such components. These components also have high K-L diver-gence values for various output residues. This means that our set of func-tional testcases is not sufficient to completely characterize such components.Some examples of such components are MC o2, MC AFR, oxygen correction,pumping constant, map estimate, throttle estimate,etc.

In order to characterize such components better, we might be tempted touse more number of functional testcases, thereby obtaining more number ofLUTs. But, there are several limitations to generating large amount of LUTs.First of all, there is a time constraint. Simulation for multiple functionaltestcases require a large amount of time; so we cannot go on generatingLUTs forever. Secondly, the larger the number of LUTs we obtain, the moredifficult it becomes to store and process such LUTs. Greater amount ofcompaction then becomes a major requirement.

43

Figure 4.3: Original Signal

4.2.2 K-L Divergence using Alternate Representation

The comparision of two output residues, obtained from same input residueadded to different functional testcases, is done using the metric of K-L Diver-gence. We cannot compare the residues,simply on the basis of LUT values,because sometimes, the spikes may be slightly lower than the specified spikelimits and hence may be misinterpreted as noise. This would throw off theLUT values, but the divergence value may still be very small. For obtain-ing K-L Divergence, we use an alternate representation to capture the basicbehaviour of the signal.

Sample signals using Alternate Representation.

K-L Divergence. As the table 4.1 shows, the components of type 1 havea much lower K-L Divergence value than components of type 2. For type 2components, we can perform a further distinction, based on the faults addedto each of the two golden inputs, as follows:

1. Type 2a. In such components, both the inputs are susceptible to allthree types of faults, viz. noise, shift and spike faults. Examples of

44

Figure 4.4: Alternate Signal Representation

Figure 4.5: Original Signal

45

Figure 4.6: Alternate Signal Representation

such components are Oxygen correction,Pumping constant, MC AFR,etc.

2. Type 2b. For such components, only one input is susceptible to allthree kinds of faults while the other is susceptible to only spike faults.Examples of such components are the integrator module, low mode,rich mode and ramp rate.

It is observed that the Divergence value of Type 2b components is lessthan the Divergence value of Type 2a components. This is possibly due tothe presence of lesser variety of faults in one input for type 2b components.As seen from the table, the K-L Divergence value of the integrator module,a type 2b component, is 23.47 while that of the pumping constant module, atype 2a component, has a much higher divergence value of 52.51, reinforcingour conclusion.

Faults Other than the Quantized Values. Fault injections were donechoosing noise and shift faults other than the representative values inside asingle level of granularity. The LUTs generated had somewhat different val-

46

Table 4.1: K-L Divergence for components

Component K-L Divergence Component Type

Adder 0.51 1Multiplier 0.76 1

TAM MAP 1.23 1TAM MAR 1.45 1Integrator 23.47 2

Oxygen correction 32.45 2MC o2 37.78 2

MC AFR 41.38 2Pumping constant 52.51 2

ues than that of the ones obtained by using the values of the quantized levelsof faults. Nonetheless, the same pattern followed for all type 1 components,i.e. even these generated almost similar output residues for different func-tional testcases. Hpwever, the type 2 components continued to show widevariation in the output residues, generating different signatures that were de-pendant on the functional testcase. Again, a white box approach may proveuseful to performing such analysis.

4.3 Behaviour of Certain Components

In this section, we present the behaviour of both type 1 and type 2 compo-nents, in terms of their LUT values as well as the output residue signals. Wechoose the simplest type 1 and type 2 components, namely the adder andthe oxygen correction module respectively.

4.3.1 Adder Module

The adder is a simple component, which has two inputs, both of which aresusceptible to all three types of faults, and the output signal is simply thesum of the two input signals.

Consider the following case:

� Granularity = 3 for noise and shift faults and 5 for spike faults

47

Figure 4.7: The Adder Module

� Input 1: <noise, shift, spike> = <2, 2, 3>


� Input 1 Axes: noise = [0,367,734], shift = [-367,0,367](scaling factor =10−4 for both shift and noise faults), spike = [0,1,2,3,4,5]

� Input 2 Axes: noise = [0,35,70], shift = [-35,0,35](scaling factor = 10−4

for both shift and noise faults), spike = [0,1,2,3,4,5]

Fig. 4.8 shows the injected input residues, while Fig. 4.9 and 4.10 showthe behaviour of the output residues for two different functional testcases.

The two output residues are obtained from the same input residue beingadded to two different functional testcases. As evident from the figures, boththese signals are quite similar, with a very low K-L Divergence value of 0.57.The LUT tables corresponding to the above output residues are as follows:

Noise LUTs.

LUT1,noise(:, :, 2, 2, 3, 2) =

9483 9512 94979412 9565 95139383 9625 9539

48

Figure 4.8: Injected Faults into the Golden Inputs

Figure 4.9: Output Residue 1

49


LUT2,noise(:, :, 2, 2, 3, 2) =

9461 9533 94499407 9579 94879349 9658 9515

Shift LUTs.

LUT1,shift(:, :, 2, 2, 3, 2) =

4867 4878 48394833 4895 48614821 4902 4893

LUT2,shift(:, :, 2, 2, 3, 2) =

4861 4867 48444817 4879 48694809 4893 4881

Spike LUTs.

LUT1,spike(:, :, 2, 2, 3, 2) =

1 1 11 1 11 1 1

50

Figure 4.11: The oxygen correction module

LUT2,spike(:, :, 2, 2, 3, 2) =

1 1 11 1 11 1 1

Discussion. The numbers in bold indicate the LUT values correspondingto the considered faulty scenario. As one can clearly see, both the signalshave near about the same LUT values. This kind of behaviour was observedfor all pairs of output residues obtained by using all the 5 functional testcases.Thus, the adder has been completely characterized using our set of testcases.

4.3.2 Oxygen Correction Module

The Oxygen correction module is a simple two input component that con-trols the amount of oxygen needed for various fuel consumption requirements.Both its inputs are susceptible to all three types of faults, and the outputsignal represents the oxygen needed for different fuel consumption require-ments.




51

Figure 4.12: Injected Faults into the golden inputs


� Input 1 Axes: noise = [0,500,1000], shift = [-500,0,500](scaling factor= 10−4 for both shift and noise faults), spike = [0,1,2,3,4,5]

� Input 2 Axes: noise = [0,98,196], shift = [-98,0,98](scaling factor =10−4 for both shift and noise faults), spike = [0,1,2,3,4,5]

Fig. 4.12 shows the injected input residues, while Fig. 4.13 and 4.14 showthe behaviour of the output residues for two different functional testcases.

The two output residues are obtained from the same input residue beingadded to two different functional testcases. As evident from the figures, boththese signals are quite different, with a high K-L Divergence value of 33.17.The LUT tables corresponding to the above output residues are as follows:

52



53

Noise LUTs.

LUT1,noise(:, :, 2, 2, 3, 2) =

7984 7872 78917921 7963 79127975 7944 7935

LUT2,noise(:, :, 2, 2, 3, 2) =

5716 5677 55485892 5783 56725613 5881 5469

Shift LUTs.

LUT1,shift(:, :, 2, 2, 3, 2) =

−2085 −2123 −2129−2146 -2135 −2097−2131 −2098 −2123

LUT2,shift(:, :, 2, 2, 3, 2) =

752 735 729861 745 737719 787 827

Spike LUTs.

LUT1,spike(:, :, 2, 2, 3, 2) =

1 1 11 1 11 1 1

LUT2,spike(:, :, 2, 2, 3, 2) =

1 1 12 2 11 2 1

Discussion. The numbers in bold indicate the LUT values correspondingto the considered faulty scenario. As one can clearly see, both the signalshave radically different LUT values. The magnitude of the spike limit forthe considered set of functional testcases is set at 0.601. The discernablenegative spike has a magnitude of 0.595 in the first residue and a magnitudeof 0.607 in the second residue. So, it is not counted as a spike in the firstresidue but is counted as such in the second residue. Hence, a difference of1 appears in the spike LUTs. Also, in the first residue, since the negativespike is actually not counted as a spike, it contributes to the noise and shiftfaults, resulting in higher noise amplitude(LUT values) and a negative shift

54

value. On the other hand, in the second residue, as the negative spike isclassified as a spike, the noise is limited only to the observable crests andtroughs resulting in lower noise values and the shift also has a positive value.The LUTs of the second residue capture the signal much better than that ofthe first residue.

This kind of behaviour was observed for all pairs of output residues ob-tained by using all the 5 functional testcases. Thus, we can clearly see thatour characterization is by no means complete, and the LUTs cannot be com-bined in any way. So, for type 2 components, we need more functionaltestcases or a different technique for characterization.

4.4 Anti Lock Braking System

An anti-lock braking system, or ABS is a safety system which preventsthe wheels on a motor vehicle from locking up (or ceasing to rotate) whilebraking. It simulates the dynamic behavior of a vehicle under hard brakingconditions. The model represents a single wheel, which may be replicateda number of times to create a model for a multi-wheel vehicle. The wheelrotates with an initial angular speed that corresponds to the vehicle speedbefore the brakes are applied. We used separate integrators to compute wheelangular speed and vehicle speed. We use two speeds to calculate slip, whichis determined by:

ωv =Vv

Rr

slip = 1− ωw

ωvwhere:

� ωv = vehicle speed divided by wheel radius

� Vv = vehicle linear velocity

� Rr = wheel radius

� ωw = wheel angular velocity

55

Figure 4.15: The Anti Lock Braking System

From these expressions, we see that slip is zero when wheel speed andvehicle speed are equal, and slip equals one when the wheel is locked. Adesirable slip value is 0.2, which means that the number of wheel revolutionsequals 0.8 times the number of revolutions under non-braking conditions withthe same vehicle velocity. This maximizes the adhesion between the tire androad and minimizes the stopping distance with the available friction. In thismodel, we used an ideal anti-lock braking controller, that uses ’bang-bang’control based upon the error between actual slip and desired slip. We setthe desired slip to the value of slip at which the mu-slip curve reaches a peakvalue, this being the optimum value for minimum braking distance.

4.4.1 Slip fcn Mux unit

This module is actually a combination of a MUX and a simple mathemat-ical module that computes the relative slip. It behaves exactly like a type1 component of the previous system. In fact, except the integrator module,all other components of the ABS are of type 1, i.e. they have similar outputresidues for same input residues, irrespective of the golden input set.




56

Figure 4.16: The slip fcn mux model

57

Figure 4.17: Injected Faults into the golden inputs


� Input 1 Axes: noise = [0,351943,703885], shift = [-351943,0,351943](scalingfactor = 10−4 for both shift and noise faults), spike = [0,1,2,3,4,5]

� Input 2 Axes: noise = [0,351999,351999], shift = [-351999,0,351999](scalingfactor = 10−4 for both shift and noise faults), spike = [0,1,2,3,4,5]

The two output residues are obtained from the same input residue beingadded to two different functional testcases. As evident from the figures, boththese signals are quite similar, with a very low K-L Divergence value of 1.24.The LUT tables corresponding to the above output residues are as follows:

58



59

Noise LUTs.

LUT1,noise(:, :, 2, 2, 3, 2) =

680567 681134 682119679154 678715 679226680013 679718 682043

LUT2,noise(:, :, 2, 2, 3, 2) =

698436 684751 687822683155 680561 681138689342 680145 682249

Shift LUTs.

LUT1,shift(:, :, 2, 2, 3, 2) =

64267 64378 6383963833 64195 6436163921 64102 63893

LUT2,shift(:, :, 2, 2, 3, 2) =

64161 64267 6404463817 64179 6480964209 64393 64081

Spike LUTs.

LUT1,spike(:, :, 2, 2, 3, 2) =

3 3 33 3 33 3 3

LUT2,spike(:, :, 2, 2, 3, 2) =

3 3 33 3 33 3 3

Discussion. The numbers in bold indicate the LUT values correspondingto the considered faulty scenario. As one can clearly see, both the signals havenear about the same LUT values. This kind of behaviour was observed forall pairs of output residues obtained by using all the 5 functional testcases.Thus, the slip fcn mux has been completely characterized using our set oftestcases.

60

4.5 Dynamic Analysis

Using the modified characterized data, i.e. the new axes values and new LUTsfor each component, the dynamic analysis was performed and compared withthe existing approach. The results are tabulated as follows:

Table 4.2: Dynamic Analysis: Using old and new LUTs

Fuel Fault Rate(%) Counterexamples generated Counterexamples generated

using old LUTs using new LUTs(No. of Potential Candidates) (No. of Potential Candidates)

5 1164(1171) 1143(1147)6 1127(1171) 1124(1147)7 1053(1171) 1089(1147)8 972(1171) 1014(1147)9 887(1171) 944(1147)10 799(1171) 872(1147)

The dynamic analysis was performed on a high speed processor for one hourin both cases. The algorithm generates the number of cases that violate faultbarriers starting from 5% to 10%. The numbers in the bracket indicate the no.of faulty cases that the dynamic analysis algorithm expects to generate. Theother numbers indicate how many counterexamples(cases that violate thefault tolerance requirements) were actually produced. It can be clearly seenthat the percentage of counterexamples generated using the modified LUTsis more than that generated using the exisiting LUTs. This indicates thatthe modified characterization technique is actually effective and generatesbetter quality LUTs.

61

Chapter 5

Conclusions and Action Plan

5.1 Conclusions

The existing method for fault-tolerance analysis of automotive systemshas several salient features, including the ability to handle quality faults,and performing analysis at the operation-level abstraction. This method canidentify certain types of accumulated loss of precision (or quality) in auto-motive systems, which may lead to failures. Additionally, it can abstract themanifestation of automotive system failures to the operation-level, therebyenabling all analysis steps to be performed on operation-level models likeSimulink. This facilitates quicker analysis, as well as system simulation forlonger time durations wherein quality errors accumulating over time may beunearthed.

Two methods work in tandem for providing fault-tolerance analysis, namelyan operation-level simulation method and a lookup table-based analysis method.The operation-level simulation method is based on fault-injection and simulation-trace analysis. On the other hand, the lookup table analysis method quicklyanalyzes a larger number of fault scenarios by ignoring details of operationbehavior and concentrating on quality analysis. This method is based oncharacterizing individual sub-systems, and storing the variation in outputquality versus input quality as a lookup table. Through experiments on anautomotive case study, the efficacy of using lookup tables to abstract thequality response of operations was studied. It was found that the number ofsimulation runs needed for exhaustive fault-analysis is very large; the lookup

62

table-based method helps in providing a quick, but approximate answer, andsignificantly higher coverage. However, the completeness and accuracy ofcharacterization play a major role in the usefulness of quality LUTs.

We retain the same basic approach of using the lookup table-based andfault-simulation-based methods in tandem, as it ensures good coverage, whilemaintaining the accuracy of analysis. However, keeping the importance ofLUTs in mind, in order to improve their accuracy, a new technique for char-acterizing components was developed. It characterized certain componentsvery accurately and almost completely while others had to compromise withonly slightly improved versions of LUTs. Nonetheless, the improved LUTswere used to perform dynamic analysis on the system. The new LUTs pro-vided very encouraging results with the number of counterexamples gener-ated about 6% more than that of the old LUTs. However, their is still scopefor improvement in the dynamic analysis technique. Currently, we have notyet examined whether the counterexamples produced are actually very closeto each other or far apart. Ideally, we would like them to be as distinct aspossible to cover as much faulty cases as possible.

5.2 Action Plan

The modified techniques have certainly been a step closer towards getting ageneric, robust method for performing fault tolerant analysis of automotivesystems. But the scope for further analysis is huge. We take a look at someof the steps that we would like to follow in the near future.

Static Analysis. Exhaustive analysis of quality fault-tolerance is a com-putationally intensive problem. A method has been proposed which suitablyabstracts the behavior of operation-level models, to discrete lookup tables,and then performs static analysis to identify hotspots in the design. Thesehotspots are then analyzed in the operation-level model for checking theirvalidity. The proposed static analysis-based flow may also be treated as adirected testing mechanism. A tool-chain has been developed to perform thisanalysis. As a first step, we would like to use the modified LUTs to performthis static analysis and compare the results with those obtained from usingthe old LUTs.

63

Compacted LUTs. Two algorithms for LUT compaction within the LUThave been developed. One of them is a very fast algorithm that provides25-35% compaction. It is highly interesting to utilize these compacted LUTsto perform static and dynamic analysis, to examine the accuracy of LUTspreserved under the compaction procedure. Meanwhile, other compactiontechniques are also being looked into.

Gray Box Approach Analyzing an automotive system component, bothin terms of a white box as well as a black box can provide useful insightsto understanding the residue behaviour of the component. So, a gray boxanalysis of the system at hand is to be performed to obtain any additionalinformation about the components, that could prove useful in modifying theLUTs for better static and dynamic analysis.

Fault Injections Currently, the fault injections for dynamic analysis arenot very elegant. For, example a single fault value is chosen from a very widerange at random. We would like to quantize these ranges into more usefulintervals, in order to ensure that the counterexamples produced are actuallyvery distinct from each other. Also, we would like to limit the number offault injection points for better handling of the analysis.

Increasing Granularity Given that all the compaction techniques workproperly, we can look into increasing the granularity of analysis as it will ob-viously improve the accuracy of the LUTs. We hope that using the modifiedcharacterization technique at a high granularity, we can come up with muchimproved LUTs for performing fault tolerant analysis.

64

Bibliography

[1] Rajeev Alur, Aditya Kanade, S. Ramesh, and K. C. Shashidhar. Symbolicanalysis for improving simulation coverage of simulink/stateflow models.In EMSOFT 08: Proceedings of the 8th ACM international conference onEmbedded software, pages 8998, New York, NY, USA, 2008. ACM.

[2] M. Baleani, A. Ferrari, L. Mangeruca, A. Sangiovanni-Vincentelli, Mau-rizio Peri, and Saverio Pezzini. Fault-tolerant platforms for automotivesafety-critical applications. In CASES 03: Proceedings of the 2003 inter-national conference on Compilers, architecture and synthesis for embeddedsystems, pages 170177, New York, NY, USA, 2003. ACM.

[3] P. Bellini, R. Mattolini, and P. Nesi. Temporal logics for real-time systemspecification,ACM Comput. Surv., 32(1):1242, 2000.

[4] Ashok Chandy, Mark Phillip Colosky, and Mark Dennis Kushion. Robustfault detection method. In US Patent 6,389,338 B1, May 14, 2002.

[5] Krishnendu Chatterjee, Arkadeb Ghosal, Thomas A. Henzinger, DanielIercan, Christoph M. Kirsch, Claudio Pinello, and Alberto Sangiovanni-Vincentelli. Logical reliability of interacting real-time tasks. In DATE 08:Proceedings of the conference on Design, automation and test in Europe,pages 909914, New York, NY, USA, 2008. ACM.

[6] F. Corno, P. Gabrielli, and S. Tosato. Relating vehicle-level and network-level reliability through high-level fault injection. In HLDVT 03: Proceed-ings of the Eighth IEEE International Workshop on High-Level DesignValidation and Test Workshop, page 71, Washington, DC, USA, 2003.IEEE Computer Society.

65

[7] Dipankar Das, P. P. Chakrabarti, and Rajeev Kumar. Functional ver-ification of task partitioning for multiprocessor embedded systems. ACMTrans. Des. Autom. Electron. Syst., 12(4):44, 2007.

[8] Jon Friedman. Matlab/simulink for automotive systems design. In DATE06: Proceedings of the conference on Design, automation and test in Eu-rope, pages 8788, Leuven, Belgium, Belgium, 2006. European Design andAutomation Association.

[9] R. Hoseinnezhad and A. Bab-Hadiashar. Missing data compensation forsafety-critical components in a drive-by-wire system. Vehicular Technology,IEEE Transactions on, 54(4):13041311, July 2005.

[10] Viacheslav Izosimov, Paul Pop, Petru Eles, and Zebo Peng. Design op-timization of time and cost-constrained fault-tolerant distributed embeddedsystems. In DATE 05: Proceedings of the conference on Design, Automa-tion and Test in Europe, pages 864869, Washington, DC, USA, 2005. IEEEComputer Society.

[11] Mark John Jordan, Mark Maiolani, and Andreas Both. Fault-tolerantelectronic braking system. In US Patent 6,540,309 B1, April 1, 2003.

[12] Stefan Kueperkoch, Wolfgang Grimm, Aleksandar Kojic, Jasim Ahmed,Jean-Pierre Hathout, and Christopher Martin. Fault-tolerant vehicle sta-bility control. In US Patent 7,245,995 B2, July 17, 2007.

[13] Nancy Lynch, Roberto Segala, and Frits Vaandrager. Hybrid i/o au-tomata. Inf. Comput.,185(1):105157, 2003.

[14] M. Mahmoud, J. Jiang, and Y. Zhang. Stochastic stability of fault toler-ant control systems in the presence of noise. American Control Conference,2000. Proceedings of the 2000, 6:42944298 vol.6, 2000.

[15] P. Ramachandran, P. Kudva, J. Kellington, J. Schumann, and P. Sanda.Statistical fault injection. Dependable Systems and Networks With FTCSand DCC, 2008. DSN 2008. IEEE International Conference, pages 122127,June 2008.

[16] Jeonghee Shin, Victor V. Zyuban, Zhigang Hu, Jude A. Rivers, andPradip Bose. A framework for architecture-level lifetime reliability model-ing. In DSN, pages 534543, 2007.

66

[17] Dipankar Das, P. P. Chakrabarti, and Purnendu Sinha. Operation-levelanalysis of quality degradation faults using characterized quality behaviorsof components. GM-IITKGP CRL, September 2009

67

fault tolerant analysis of automotive systemsthe existing approach uses two fault analysis methods...

Documents