software reliability engineering

Upload: pramod-ca

Post on 05-Jan-2016

237 views

Category:

Documents


0 download

DESCRIPTION

Software reliability engineering

TRANSCRIPT

  • Software reliability engineering

    Alfs T. Berztiss

    1;2

    1

    University of Pittsburgh, Pittsburgh PA 15260, USA

    email: [email protected]

    2

    SYSLAB, University of Stockholm, Sweden

    Abstract. The primary purpose of this brief nontechnical overview of

    software reliability engineering (SRE) is to clear up some misconceptions

    regarding SRE. We discuss several uses of SRE, and emphasize the im-

    portance of operational proles in SRE. An SRE process is outlined in

    which several SRE models are applied simultaneously to failure data, and

    several versions of key components may be used to improve reliability.

    1 Introduction

    Software reliability engineering (SRE) is an adaptation of hardware reliability

    engineering (HRE). One area in which there is much interaction between hard-

    ware and software is telecommunications, and much of the early research on

    software reliability was carried out by telecommunications engineers. The rst

    major text on SRE by Musa et al. [1] was published in 1987; for a later (1996)

    survey by Musa and Ehrlich see [2]. A more recent (1998) survey is by Tian [3]. A

    handbook of SRE [4] appeared in 1996. The handbook is a comprehensive survey

    of all aspects of SRE. We shall therefore limit ourselves to a brief nontechnical

    overview of the basic principles of SRE.

    The SRE process has ve main steps. (1) An operational prole is derived.

    This is a a usage distribution, i.e., the extent to which each operation of the

    software system is expected to be exercised. (2) Test cases are selected in accor-

    dance with the operational prole. This is to ensure that system usage during

    testing is as close as can be to the usage patterns expected after release of the

    system. (3) Test cases are run. For each test case that results in failure the time

    of failure is noted, in terms of execution time, and the execution time clock is

    stopped. After repair the clock is started again, and testing resumes. (4) An

    appropriate statistical model is periodically applied to the distribution of the

    failure points over time, and an estimate of the reliability is determined. If an

    acceptable reliability level has been reached, the system is released. (5) Failure

    data continue to be gathered after system release.

    Reliability is usually expressed in one of two ways, as a probability R, which

    is the probability that the system will not fail in the next t units of execution

    time, or as MTTF, which stands for mean time to failure, and is an estimate of

    the length of time that will elapse before the next failure of the system occurs.

    Related to reliability are testability and trustability. Testability [5] is the prob-

    ability that a program will fail if it contains at least one fault; trustability [6] is

    a measure of our condence that a program is free of faults.

  • Section 2 contrasts hardware and software reliabilities, and denes several

    uses for SRE. Section 3 deals with the construction of operational proles, and

    Section 4 outlines the SRE process. In Section 5 we discuss miscellaneous topics

    not covered earlier.

    2 Development and uses of SRE

    Numerous dierences between hardware and software aect reliability. Hard-

    ware failures result initially from failures of components or assemblies due to

    manufacturing faults, are relatively few after this initial period, but rise when

    wear-and-tear sets in. A plot of failure probability over time has a \bathtub"

    shape. With software, reliability continues to increase throughout the lifetime

    of the system | there is no wear-and-tear. Hence the statistical models used in

    SRE dier substantially from the models of HRE, and are called reliability growth

    models. It is true that early assembly language programs deteriorated very badly

    toward the end of their lifetimes because corrective patches destroyed whatever

    structure the programs might have had to begin with, but today the importance

    of well-structured software is fully understood, and then there is no software

    deterioration with time. Under adaptive and perfective maintenance there can

    occur deterioration, but the system resulting from such maintenance eorts can

    be regarded as a new system whose reliability estimation is to be started from

    scratch.

    Another major dierence is that hardware deteriorates even when not in

    use. For example, the reliability of an automobile with moderate use is higher

    than that of an automobile with no use at all. Software failures occur only when

    the software is being executed. Hence SRE models are based on execution time

    instead of calendar time. Moreover, hardware is for the most part built from

    standardized interchangeable parts so that corrective action after a failure can

    consist of no more than the substitution of a new component for the failed one.

    With software it is often dicult to locate the fault that caused a failure, and

    removal of the fault may introduce new faults. In SRE this is accounted for by

    a fault reduction factor B, which is 1.0 if fault removal is perfect. In practice B

    has been found to have a value around 0.95 [1], but it can be much worse for

    distributed systems in which fault location is very dicult because it may be

    impossible to recreate the state in which failure occurred.

    The nal dierence relates to the hazard rate z(t) | z(t)dt is the probability

    that the system that has survived to time t will fail in the interval between t and

    t + dt. Software failures can be traced back to individual faults, which has led

    to the use of per-fault hazard rates instead of the global hazard rate of HRE.

    To summarize, the HRE approach has been modied by the introduction

    of dierent reliability growth models, and the use of execution time, fault re-

    duction factors, and per-fault hazard rates in these models. These modications

    have made SRE a practicable tool of software engineering, but one important dif-

    ference remains. Hardware failures are for most part determined by the physical

    properties of the materials used in the hardware. Software failures arise because

  • of human error, and the unpredictability of human behavior should invalidate

    statistical methods of prediction. Still, the methods work. We may have to ap-

    ply more than one reliability growth model to a given set of failure data, but

    one of the models can be expected to represent the data quite well. There is still

    one proviso: statistical estimation requires fairly large numbers of failures, which

    means that SRE can be applied only to large systems.

    The techniques of SRE have two main uses. First, SRE can give a quantitative

    estimate of the reliability of a software system at the time it is released. As

    a corollary, given a reliability goal, testing can stop when this goal has been

    reached. Second, SRE can reduce system development cost. The reliability of

    a system can be computed from the reliabilities of its components. Suppose

    a system has three components. Total reliability is the product of the three

    component reliabilities: R = R

    1

    R

    2

    R

    3

    . If one of the R

    i

    is increased, another

    may be decreased without changing the value of R. Under system balancing

    [4, p.261], given a required system reliability R, high reliability is aimed for in

    components in which this can be achieved at low cost, and lower reliability in

    high-cost components. Cost reduction can also be achieved by the use of COTS

    (Commercial O-The-Shelf) software. However, if it is to be used in a SRE

    context, it has to be provided with reliability estimates. Berman and Cutler [7]

    discuss reliability of a system that uses components from a reuse library.

    If only the total system reliability is of interest, system balancing can be

    applied to hardware as well as software components. There is a misconception

    that software failures are more likely than hardware failures. Actually hardware

    failures may be more likely [8], but locating faults and removing them is usually

    easier in hardware than in software.

    3 The operational prole

    A reliability estimate established by the SRE process is meaningful only if the

    conditions under which the software is tested closely correspond to the condi-

    tions under which it will be used in the eld. This means that test cases are to

    have the same distribution according to type as the inputs after the system has

    been released. This distribution is called the operational prole. For example, if

    20% of inputs to an information system are data updates and 80% are queries,

    then the test cases should have a 20-80 distribution in favor of queries. The

    construction of an operational prole is discussed in detail by Musa [9]; a later

    survey by Musa et al. forms Chapter 5 of [4]. The development of an operational

    prole may take ve steps, as follows.

    Denition of client types. This relates to product software, i.e., to software

    that is developed for a number of dierent clients. Consider a generic automated

    enquiry system that responds to telephone calls by customers. The system con-

    sists of prompts to the caller, call routing, canned responses, the possibility of

    speaking to a customer service representative, and wait lines. This structure is

    the same for all systems, only the content of the prompts and the canned re-

    sponses diers from system to system. The systems for airlines will be more

  • similar to each other than to a system that handles health insurance enquiries.

    Airlines and health insurance companies thus form two client types.

    Denition of user types. A user type consists of persons (or other software, as

    in automatic business-to-business e-commerce) who will use the system in the

    same way. In the case of the enquiry system, some users will update the canned

    response les, and this will happen more often for airlines than for health insur-

    ance companies. Other users will listen to responses, and still others will want

    to talk to customer representatives.

    Denition of system modes. These are the major operational modes of a sys-

    tem, such as system initialization, reboot after a failure, overload due to excessive

    wait line build-up, gathering of system usage statistics, monitoring of customer

    service representatives, selection of the language in which the customer will in-

    teract with the system.

    Denition of functional prole. In this step each system mode is broken down

    into the procedures or transactions that it will need, and the probability of use

    of each such component is estimated. The functional prole is developed during

    the design phase of the software process. (For a discussion of procedural and

    transactional computation see [10].)

    Denition of operational prole. The functional prole relates to the abstrac-

    tions produced as part of software design. The abstractions of the functional

    prole are now converted into actual operations, and the probabilities of the

    functional prole are converted into probabilities of the operations.

    The ve steps are actually part of any software development process. The

    operations have to be dened sooner or later in any case, and a process based on

    the ve steps can be an eective way of dening operations. For example, user

    types can be identied with actors and system modes with use cases in use case

    diagrams of UML [11]. Identication of user types and system modes by means

    of use case diagrams can take place very early in the software process. What

    is specic to SRE and what is dicult is the association of probabilities with

    the operations. Some of the diculties are pointed out in [4, pp. 535-537]. They

    include the uncertainty of how a new software system will be used. Many systems

    are used in ways totally dierent from how the original developers envisaged

    their use | the Internet is the most famous example. Another problem is that

    requirements keep changing well after the initial design stage.

    Fortunately, it has been shown that reliability models are rather insensitive

    to errors in the operational prole [12, 13]. Note that reliability is usually pre-

    sented as a value together with a condence interval associated with the value.

    Adams [14] shows how to calculate condence intervals that take into account

    uncertainties associated with the operational prole | note that Adams de-

    nes reliability as the probability that the software will not fail for a test case

    randomly chosen in accordance with the operational prole.

    One way of arriving at a more realistic operational prole is to combine ex-

    pert opinions by the Delphi method [15]. Briey, Delphi starts with a moderator

    sending out a problem description to a group of experts. The experts suggest

    probabilities for operations, giving their reasons for this. The moderator collects

  • this information, collates it, and sends it out again. When the experts see the

    reasoning for estimates dierent from their own, they tend to adjust their es-

    timates. The objective is to arrive at a consensus after several iterations. The

    early Delphi method took a very long time. The time scale has been compressed

    by basing Delphi on modern distributed group decision systems (see, e.g., [16]).

    Such systems allow the experts to interact asynchronously from dierent loca-

    tions.

    4 The SRE process

    In Section 1 it was noted that the SRE process is to have ve steps. Step 1 was

    discussed in Section 3, and Steps 2 and 3 are straightforward. Here our concern

    will be Steps 4 and 5, but there is a question relating to software architecture

    that we have to look at rst. In HRE very high reliability can be obtained by

    placing identical components in parallel | if there are n such components and

    n1 fail, one is still functioning. For n parallel components, each with reliability

    r, the total reliability of the arrangement is

    R = 1 (1 r)

    n

    :

    If r = 0.9 and n = 3, then R = 0.999.

    Reliability improvement of software by means of redundancy is problematic.

    First, if we use identical implementations of the same operation, they will all

    fail identically, something that is very unlikely to happen with hardware. The

    solution is to create dierent implementations based on the same requirements.

    A second problem relates to common cause failures. If all the components were

    identical, all failures would be common cause. When components are not iden-

    tical, we have n-version programming. The nature of failures in n-version pro-

    gramming has been examined by numerous authors [17{21]. Multiple versions

    have been found to give better reliability than a single \good" version [22, 23].

    But with multiple versions their cost becomes a serious consideration | for an

    investigation of this see [24].

    Because of the cost, n-version programming is to be considered only for those

    components of a mission-critical system whose failure would have serious con-

    sequences. Suppose that each of the n components has been tested until its

    reliability has become r. This has two components, r

    s

    , which relates to failures

    specic to this version, and r

    c

    , which is the reliability with respect to common

    cause failures. Then r = r

    s

    r

    c

    = r

    1

    r

    , where the parameter (0 1)

    indicates the strength of the common cause eect. When = 0, there are no

    common cause failures; when = 1, all failures are common cause. The reliability

    of the entire n-version critical component is

    R = r

    [1 (1 r

    1

    )

    n

    ]:

    Figure 1 shows the arrangement for n = 3.

  • rs

    = r

    1

    r

    s

    = r

    1

    r

    s

    = r

    1

    r

    c

    = r

    Fig. 1 | Parallel components and common cause failures

    This analysis is based on the assumption that in case of a common cause

    failure all n versions will fail rather than just n k of them. Justication for

    the assumption comes from the observation that practically all common cause

    failures in code written by experienced programmers familiar with the domain

    of application are due to faulty requirements [18], which, of course, are the same

    for all versions. A further assumption is that the coordination mechanism for

    the dierent versions is fault-free, or that r

    c

    absorbs the eect of failures of this

    mechanism. A major problem is that the critical components are likely to be

    too small to give enough failures for a meaningful reliability estimation for the

    replicated components. Techniques for dealing with situations in which testing

    results in no failures can help in such a case [25].

    A complete software process that includes SRE is Cleanroom [26, 27]. It

    has three phases after software design has taken place. First, each design unit

    is coded, and the coder demonstrates by means of an informal mathematical

    proof that the code satises the requirements for this unit. There is no unit

    testing at all. Proofs are not documented in order to keep down costs. Second,

    the individual units are integrated. Third, an independent team tests the inte-

    grated system using test cases selected on the basis of an operational prole,

    and testing continues until a required value of MTTF has been reached. Under

    this approach an appreciable percentage of statements in the software system

    will never have been executed before release of the software, which has led to

    criticism of Cleanroom [28]. However, it has been shown that the quality of

    Cleanroom products is higher than the industry norm [29]. Still, it is dicult

    to understand why unit testing is banned | intuitively one suspects that fewer

    system test cases would be needed to reach the required MTTF if unit testing

    had been performed. Therefore, unless this has already been done, it would be

    instructive to analyze failure data to determine which of the failures would have

    been avoided by unit testing, and on the basis of this establish whether unit

    testing would have lowered total testing costs.

    More serious than the construction of a truly representative operational pro-

    le is the selection of the appropriate reliability growth model. Experience shows

    that no model is appropriate for all software systems, even when the systems

  • are similar and have been developed under similar conditions. An exception is

    provided by successive releases of the same system: the model that has worked

    for the rst release works also for later releases [30]. Otherwise multiple models

    have to be used. For a full explanation of the problem and of the multiple-model

    approach see [31]. The multiple-model approach has been made fully practica-

    ble by the very high speeds of modern computers. Moreover, the user of this

    approach does not have to be a professional statistician. Various SRE tools are

    available (see, e.g., Appendix A of [4]). A particularly interesting approach has

    been taken in the design of the Sorel tool [32]. This tool has two modules: one

    evaluates reliability on the basis of four reliability growth models, the other car-

    ries out a trend analysis. The latter draws attention to unexpected problems,

    such as an extended period during which reliability does not increase.

    As discussed in [31], the main problem with reliability growth models is that

    reliability estimates are extrapolations of past experience into the future. Such

    extrapolations are not particularly safe even when the distribution of the data

    on which they are based is fairly smooth: the condence range, i.e., the range

    in which an estimated value is expected to lie with probability 0.95, say, gets

    wider and wider the further out we extrapolate, until it covers the entire range

    of possible values. This problem is made worse by the lack of smoothness in

    software failure data. There is an upward trend in the time between failures

    as testing progresses, but the pattern of the intervals between failures is very

    irregular. This has a negative eect on condence intervals.

    There have been claims that SRE can improve the accuracy of software pro-

    cess cost and schedule estimates. Advance estimates of the time required for

    testing, and of the expected number of failures coupled to estimates of the time

    required for repairs would give fairly accurate estimates of both cost and du-

    ration of the test phase. Unfortunately the extrapolation problem makes the

    accuracy of early reliability prediction highly questionable. The use of Bayesian

    techniques for early reliability estimation holds some promise. This will be dis-

    cussed in Section 5.

    Matters are quite dierent after the release of a software system, particularly

    in the case of product software that is distributed as a large number of copies.

    Operational use obviously follows a perfect operational prole, and, if there is no

    repair following failures, reliability is constant, which means that no complicated

    model is required for its estimation. It is commonpractice not to carry out repairs

    after failures, but save up all repairs for the next \minor" release, i.e., a release

    that has fault removal as its only purpose. The constant reliability value for this

    release is calculated, and so forth.

    For an operating system running 24 hours a day at approximately 5000 instal-

    lations, a failure intensity of no more than 0.9 failures per 1000 years of operation

    has been established [1, pp. 205{207]. Very high reliability has also been achieved

    in consumer products [33]; this can be attributed to the use of read-only mem-

    ory. For a single-copy system the determination of ultra-high reliability would

    require an impossibly long testing phase [34]. It is therefore surprising that reg-

    ulatory agencies impose such reliability requirements on safety-critical software

  • | requirements that cannot possibly be veried. Schneidewind [35] combines

    risk analysis with SRE predictions to reduce the risk of failures of safety-critical

    systems. When there exists a software system with very high reliability, and such

    reliability is a major requirement, the best approach is not to replace the system

    or to carry out major changes on it.

    A question that now arises is what level of reliability is reasonable. This is a

    social rather than a technical issue. It is customary to assume that the greatest

    risk is threat to human life. This is so in principle, but cost and convenience are

    also important. Otherwise automobiles, which constantly threaten human lives,

    would have been banned by consumer protection agencies. It has been estimated

    that the direct cost of the famous AT&T nine-hour breakdown was 60-70 million

    dollars [36]. Human life is rarely given this high a valuation in legal settlements.

    However, as pointed out in [37], a cost-benet ratio, if made public, is not to

    outrage most members of the public. It is therefore important not to become

    cynical in a discussion on how to reduce testing costs.

    5 Additional topics

    As noted above, a system tested according to the principles of SRE can con-

    tain code that has never been executed. Mitchell and Zeil [38] introduce an ap-

    proach that combines representative and directed testing. Representative testing

    is based on on an operational prole, directed testing uses a variety of approaches

    and criteria to uncover faults eciently. One popular criterion for directed test-

    ing is statement coverage under which an attempt is made to execute as many

    statements of a program as possible. Combination of the two modes of testing

    is the subject matter of [39] as well. It should be noted that full 100% coverage

    is impossible to achieve. This is so because of some theoretical undecidability

    results, e.g., the non-existence of an algorithm for solving two non-linear inequal-

    ities. Hence, it is not always possible to nd an input that will force control along

    a particular path in the software system. Actual statement coverage in system

    tests may be only 50{60%, although testers may guess it to be 90% or better [40].

    If 100% statement coverage were possible, then, in principle, test cases could be

    generated automatically to achieve such coverage. This is impossible for even

    very simple cases. It should be realized that undecidability results indicate the

    non-existence of general algorithms. In unit testing full coverage is for the most

    part achievable by examination of the structure of the program unit.

    An interesting development in SRE has been the use of Bayesian approaches

    to reliability prediction. In most cases statistical estimation is based on obser-

    vations alone | this is a frequentist approach. However, if prior knowledge and

    beliefs are also taken into account in the estimation process, we have what is

    known as a Bayesian or subjectivist approach. Earlier work on this approach in

    SRE is surveyed in [4]. Some more recent research [41, 42] suggests that Bayesian

    techniques can give reasonable estimates of the cost of testing by analysis of early

    test data | see also [43, 44].

  • Nearly all the SRE literature deals with software composed of procedures

    written in languages such as Fortran or Pascal. The SRE approach can be ap-

    plied also to logic programs [45], and to AI software [46-48]. We have used SRE

    techniques to estimate the level of completeness of data base schemas [49]. The

    approach is as follows. A set of queries likely to be put to a data base under

    construction is collected from prospective users, together with usage ratings. A

    usage rating is an integer between 1 and 5 that indicates the relative frequency

    with which a particular query will be put. A test suite is constructed in which

    each query has as many copies as its usage rating, and this collection of queries

    is randomized. The queries are put to the data base one after another, and if the

    data base base schema has to be extended to deal with a query, this is registered

    as a failure. The failure data are analyzed using SRE techniques, thus providing

    a quantitative estimate of the completeness of the schema.

    References

    1. J.D. Musa, A. Iannino, and K. Okumoto, Software Reliability | Measurement,

    Prediction, Application, McGraw-Hill, 1987.

    2. J.D. Musa and W. Ehrlich, Advances is software reliability engineering, Advances

    in Computers 42, 1996, 77{117.

    3. J. Tian, Reliability measurement, analysis, and improvement for large systems,

    Advances in Computers 46, 1998, 159{235.

    4. M.R. Lyu (Ed), Handbook of Software Reliability Engineering, McGraw-Hill, 1996.

    5. A. Bertolino and L. Strigini, On the use of testability measures for dependability

    assessment, IEEE Trans. Software Eng. 22, 1996, 97{108.

    6. W.E. Howden and Y. Huang, Software trustability analysis, ACM Trans. Software

    Eng. Methodology 4, 1995, 36{64.

    7. O. Berman and M. Cutler, Choosing an optimal set of libraries, IEEE Trans.

    Reliability 45, 1996, 303{307.

    8. D.R. Kuhn, Sources of failure in the public switched telephone network, Computer

    30: 4, April 1997, 31{36.

    9. J.D. Musa, Operational proles in software-reliability engineering, IEEE Software

    10: 2, March 1993, 14{32.

    10. A.T. Berztiss, Transactional computation, in Database and Expert Systems Ap-

    plications (Lecture Notes in Computer Science 1677), 521{530, Springer, 1999.

    11. G. Booch, J. Rumbaugh, and I. Jacobson, The Unied Modeling Language User

    Guide, Addison-Wesley, 1999.

    12. A. Avritzer and B. Larson, Load testing software using deterministic state testing,

    in Proc. 1993 Internat. Symp. Software Testing and Analysis [published as Software

    Engineering Notes 18: 3, July 1993)], 82{88.

    13. A. Pasquini, A.N. Crespo, and P. Matrella, Sensitivity of reliability-growth models

    to operational prole errors vs testing accuracy, IEEE Trans. Reliability 45, 1996,

    531{540 (errata ibid 46, 1997, 68).

    14. T. Adams, Total variance approach to software reliability estimation, IEEE Trans.

    Software Eng. 22, 1996, 687{688.

    15. O. Helmer, Social Technology, Basic Books, 1966.

    16. D. Kenis and L. Verhaegen, The MacPolicy project: developing a group decision

    support system based on the Delphi method, in Decision Support in Public Ad-

    ministration, North-Holland, 1993, 159{170.

  • 17. J.C. Knight and N.G. Leveson, An experimental evaluation of the assumption of

    independence in multiversion programming, IEEE Trans. Software Eng. SE-12,

    1986, 96-109.

    18. P.G. Bishop, D.G. Esp, M. Barnes, P. Humphreys, G. Dahll, and J. Lahti, PODS

    | a project on diverse software, IEEE Trans. Software Eng. SE-12, 1986, 929{940.

    19. S.S. Brilliant, J.C. Knight, and N.G. Leveson, The consistent comparison problem

    in N-version software, IEEE Trans. Software Eng. 15, 1989, 1481{1485.

    20. S.S. Brilliant, J.C. Knight, and N.G. Leveson, Analysis of faults in an N-version

    software experiment, IEEE Trans. Software Eng. 16, 1990, 238{247.

    21. D.E. Eckhardt, A.K. Caglayan, J.C. Knight, L.D. Lee, D.F. McAllister, M.A. Vouk,

    and J.P.J. Kelly, An experimental evaluation of software redundancy as a strategy

    for improving reliability, IEEE Trans. Software Eng. 17, 1991, 692{702.

    22. J.P.J. Kelly, T.I. McVittie, and W.I. Yamamoto, Implementing design diversity to

    achieve fault tolerance, IEEE Software 8: 4, July 1991, 61{71.

    23. L. Hatton, N-version design versus one good version, IEEE Software 14: 6, Nov/Dec

    1997, 71{76.

    24. R.K. Scott and D.F. McAllister, Cost modeling of N-version fault-tolerant software

    systems for large N, IEEE Trans. Reliability 45, 1996, 297{302.

    25. K.W. Miller, L.J. Morell, R.E. Noonan, S.K. Park, D.M. Nicol, B.W. Murrill, and

    J.M. Voas, Estimating the probability of failure when testing reveals no failures,

    IEEE Trans. Software Eng. 18, 1992, 33{42.

    26. M. Dyer, The Cleanroom Approach to Quality Software Development, Wiley, 1992.

    27. H.D. Mills, Zero defect software: Cleanroom engineering, Advances in Computers

    36, 1993, 1{41.

    28. B. Beizer, Cleanroom process model: a critical examination. IEEE Software 14: 2,

    March/April 1997, 14{16.

    29. R.C. Linger, Cleanroom software engineering for zero-defect software, in Proc. 15th

    Int. Conf. Software Eng., pp. 2{13, 1993.

    30. A. Wood, Predicting software reliability, Computer 29: 11, Nov 1996, 69{77.

    31. S. Brocklehurst and B. Littlewood, Techniques for prediction analysis and recali-

    bration, in [4], pp. 119{166.

    32. K. Kanoun, M. Kaa^niche, and J.-C. Laprie, Qualitative and quantitative reliability

    assessment, IEEE Software 14: 2, March/April 1997,

    33. J. Rooijmans, H. Aerts, and M. v. Genuchten, Software quality in consumer elec-

    tronics products, IEEE Software 13: 1, Jan 1996, 55{64.

    34. R.W. Butler and G.B. Finelli, The infeasibility of experimental quantication of

    life-critical software reliability, IEEE Trans. Software Eng. 19, 1993, 3{12.

    35. N.F. Schneidewind, Reliability modeling for safety-critical software, IEEE Trans.

    Reliability 46, 1997, 88{98.

    36. T.C.K. Chou, Beyond fault tolerance, Computer 30: 4, April 1997, 47{49.

    37. W.R. Collins, K.W. Miller, B.J. Spielman, and P. Wherry, How good is good

    enough? An ethical analysis of software construction and use, Comm. ACM 37: 1,

    Jan 1994, 81{91.

    38. B. Mitchell and S.J. Zeil, A reliability model combining representative and directed

    testing, in Proc. 18th Int. Conf. Software Eng., pp. 506{514, 1996.

    39. J.R. Horgan and A.P. Mathur, Software testing and reliability, in [4], pp. 531{566.

    40. P. Piwowarski, M. Ohba, and J. Caruso, Coverage measurement experience during

    function test, in Proc. 15th Int. Conf. Software Eng., pp. 287{301, 1993.

    41. J.M. Caruso and D.W. Desormeau, Integrating prior knowledge with a software

    reliability growth model, in Proc. 13th Int. Conf. Software Eng., pp. 238{245, 1991.

  • 42. C. Smidts, M. Stutzke, and R.W. Stoddard, Software reliability modeling: an ap-

    proach to early reliability prediction, IEEE Trans. Reliability 47, 1998, 268{278.

    43. S. Campodonico and N.D. Singpurwalla, A Bayesian analysis of the logarithmic-

    Poisson execution time model based on expert opinion and failure data, IEEE

    Trans. Software Eng. 20, 1994, 677{683.

    44. M.-A. El Aroui and J.-L. Soler, A Bayes nonparametric framework for software-

    reliability analysis, IEEE Trans. Reliability 45, 1996, 652{660.

    45. A. Azem, Software Reliability Determination for Conventional and Logic Program-

    ming, Walter de Gruyter, 1995.

    46. F.B. Bastani, I.-R. Chen, and T.-W. Tsao, A software reliability model for articial

    intelligence programs, Int. J. Software Eng. Knowledge Eng. 3, 1993, 99{114.

    47. F.B. Bastani and I.-R. Chen, The reliability of embedded AI systems, IEEE Expert

    8: 2, April 1993, 72{78.

    48. I.-R. Chen, F.B. Bastani, and T.-W. Tsao, On the reliability of AI planning soft-

    ware in real-time applications, IEEE Trans. Knowledge and Data Eng. 7, 1995,

    4{13.

    49. A.T. Berztiss and K.J. Matjasko, Queries and the incremental construction of

    conceptual models, in Information Modelling and Knowledge Bases VI, pp. 175{

    185, IOS Press, 1995.