software reliability engineering
DESCRIPTION
Software reliability engineeringTRANSCRIPT
-
Software reliability engineering
Alfs T. Berztiss
1;2
1
University of Pittsburgh, Pittsburgh PA 15260, USA
email: [email protected]
2
SYSLAB, University of Stockholm, Sweden
Abstract. The primary purpose of this brief nontechnical overview of
software reliability engineering (SRE) is to clear up some misconceptions
regarding SRE. We discuss several uses of SRE, and emphasize the im-
portance of operational proles in SRE. An SRE process is outlined in
which several SRE models are applied simultaneously to failure data, and
several versions of key components may be used to improve reliability.
1 Introduction
Software reliability engineering (SRE) is an adaptation of hardware reliability
engineering (HRE). One area in which there is much interaction between hard-
ware and software is telecommunications, and much of the early research on
software reliability was carried out by telecommunications engineers. The rst
major text on SRE by Musa et al. [1] was published in 1987; for a later (1996)
survey by Musa and Ehrlich see [2]. A more recent (1998) survey is by Tian [3]. A
handbook of SRE [4] appeared in 1996. The handbook is a comprehensive survey
of all aspects of SRE. We shall therefore limit ourselves to a brief nontechnical
overview of the basic principles of SRE.
The SRE process has ve main steps. (1) An operational prole is derived.
This is a a usage distribution, i.e., the extent to which each operation of the
software system is expected to be exercised. (2) Test cases are selected in accor-
dance with the operational prole. This is to ensure that system usage during
testing is as close as can be to the usage patterns expected after release of the
system. (3) Test cases are run. For each test case that results in failure the time
of failure is noted, in terms of execution time, and the execution time clock is
stopped. After repair the clock is started again, and testing resumes. (4) An
appropriate statistical model is periodically applied to the distribution of the
failure points over time, and an estimate of the reliability is determined. If an
acceptable reliability level has been reached, the system is released. (5) Failure
data continue to be gathered after system release.
Reliability is usually expressed in one of two ways, as a probability R, which
is the probability that the system will not fail in the next t units of execution
time, or as MTTF, which stands for mean time to failure, and is an estimate of
the length of time that will elapse before the next failure of the system occurs.
Related to reliability are testability and trustability. Testability [5] is the prob-
ability that a program will fail if it contains at least one fault; trustability [6] is
a measure of our condence that a program is free of faults.
-
Section 2 contrasts hardware and software reliabilities, and denes several
uses for SRE. Section 3 deals with the construction of operational proles, and
Section 4 outlines the SRE process. In Section 5 we discuss miscellaneous topics
not covered earlier.
2 Development and uses of SRE
Numerous dierences between hardware and software aect reliability. Hard-
ware failures result initially from failures of components or assemblies due to
manufacturing faults, are relatively few after this initial period, but rise when
wear-and-tear sets in. A plot of failure probability over time has a \bathtub"
shape. With software, reliability continues to increase throughout the lifetime
of the system | there is no wear-and-tear. Hence the statistical models used in
SRE dier substantially from the models of HRE, and are called reliability growth
models. It is true that early assembly language programs deteriorated very badly
toward the end of their lifetimes because corrective patches destroyed whatever
structure the programs might have had to begin with, but today the importance
of well-structured software is fully understood, and then there is no software
deterioration with time. Under adaptive and perfective maintenance there can
occur deterioration, but the system resulting from such maintenance eorts can
be regarded as a new system whose reliability estimation is to be started from
scratch.
Another major dierence is that hardware deteriorates even when not in
use. For example, the reliability of an automobile with moderate use is higher
than that of an automobile with no use at all. Software failures occur only when
the software is being executed. Hence SRE models are based on execution time
instead of calendar time. Moreover, hardware is for the most part built from
standardized interchangeable parts so that corrective action after a failure can
consist of no more than the substitution of a new component for the failed one.
With software it is often dicult to locate the fault that caused a failure, and
removal of the fault may introduce new faults. In SRE this is accounted for by
a fault reduction factor B, which is 1.0 if fault removal is perfect. In practice B
has been found to have a value around 0.95 [1], but it can be much worse for
distributed systems in which fault location is very dicult because it may be
impossible to recreate the state in which failure occurred.
The nal dierence relates to the hazard rate z(t) | z(t)dt is the probability
that the system that has survived to time t will fail in the interval between t and
t + dt. Software failures can be traced back to individual faults, which has led
to the use of per-fault hazard rates instead of the global hazard rate of HRE.
To summarize, the HRE approach has been modied by the introduction
of dierent reliability growth models, and the use of execution time, fault re-
duction factors, and per-fault hazard rates in these models. These modications
have made SRE a practicable tool of software engineering, but one important dif-
ference remains. Hardware failures are for most part determined by the physical
properties of the materials used in the hardware. Software failures arise because
-
of human error, and the unpredictability of human behavior should invalidate
statistical methods of prediction. Still, the methods work. We may have to ap-
ply more than one reliability growth model to a given set of failure data, but
one of the models can be expected to represent the data quite well. There is still
one proviso: statistical estimation requires fairly large numbers of failures, which
means that SRE can be applied only to large systems.
The techniques of SRE have two main uses. First, SRE can give a quantitative
estimate of the reliability of a software system at the time it is released. As
a corollary, given a reliability goal, testing can stop when this goal has been
reached. Second, SRE can reduce system development cost. The reliability of
a system can be computed from the reliabilities of its components. Suppose
a system has three components. Total reliability is the product of the three
component reliabilities: R = R
1
R
2
R
3
. If one of the R
i
is increased, another
may be decreased without changing the value of R. Under system balancing
[4, p.261], given a required system reliability R, high reliability is aimed for in
components in which this can be achieved at low cost, and lower reliability in
high-cost components. Cost reduction can also be achieved by the use of COTS
(Commercial O-The-Shelf) software. However, if it is to be used in a SRE
context, it has to be provided with reliability estimates. Berman and Cutler [7]
discuss reliability of a system that uses components from a reuse library.
If only the total system reliability is of interest, system balancing can be
applied to hardware as well as software components. There is a misconception
that software failures are more likely than hardware failures. Actually hardware
failures may be more likely [8], but locating faults and removing them is usually
easier in hardware than in software.
3 The operational prole
A reliability estimate established by the SRE process is meaningful only if the
conditions under which the software is tested closely correspond to the condi-
tions under which it will be used in the eld. This means that test cases are to
have the same distribution according to type as the inputs after the system has
been released. This distribution is called the operational prole. For example, if
20% of inputs to an information system are data updates and 80% are queries,
then the test cases should have a 20-80 distribution in favor of queries. The
construction of an operational prole is discussed in detail by Musa [9]; a later
survey by Musa et al. forms Chapter 5 of [4]. The development of an operational
prole may take ve steps, as follows.
Denition of client types. This relates to product software, i.e., to software
that is developed for a number of dierent clients. Consider a generic automated
enquiry system that responds to telephone calls by customers. The system con-
sists of prompts to the caller, call routing, canned responses, the possibility of
speaking to a customer service representative, and wait lines. This structure is
the same for all systems, only the content of the prompts and the canned re-
sponses diers from system to system. The systems for airlines will be more
-
similar to each other than to a system that handles health insurance enquiries.
Airlines and health insurance companies thus form two client types.
Denition of user types. A user type consists of persons (or other software, as
in automatic business-to-business e-commerce) who will use the system in the
same way. In the case of the enquiry system, some users will update the canned
response les, and this will happen more often for airlines than for health insur-
ance companies. Other users will listen to responses, and still others will want
to talk to customer representatives.
Denition of system modes. These are the major operational modes of a sys-
tem, such as system initialization, reboot after a failure, overload due to excessive
wait line build-up, gathering of system usage statistics, monitoring of customer
service representatives, selection of the language in which the customer will in-
teract with the system.
Denition of functional prole. In this step each system mode is broken down
into the procedures or transactions that it will need, and the probability of use
of each such component is estimated. The functional prole is developed during
the design phase of the software process. (For a discussion of procedural and
transactional computation see [10].)
Denition of operational prole. The functional prole relates to the abstrac-
tions produced as part of software design. The abstractions of the functional
prole are now converted into actual operations, and the probabilities of the
functional prole are converted into probabilities of the operations.
The ve steps are actually part of any software development process. The
operations have to be dened sooner or later in any case, and a process based on
the ve steps can be an eective way of dening operations. For example, user
types can be identied with actors and system modes with use cases in use case
diagrams of UML [11]. Identication of user types and system modes by means
of use case diagrams can take place very early in the software process. What
is specic to SRE and what is dicult is the association of probabilities with
the operations. Some of the diculties are pointed out in [4, pp. 535-537]. They
include the uncertainty of how a new software system will be used. Many systems
are used in ways totally dierent from how the original developers envisaged
their use | the Internet is the most famous example. Another problem is that
requirements keep changing well after the initial design stage.
Fortunately, it has been shown that reliability models are rather insensitive
to errors in the operational prole [12, 13]. Note that reliability is usually pre-
sented as a value together with a condence interval associated with the value.
Adams [14] shows how to calculate condence intervals that take into account
uncertainties associated with the operational prole | note that Adams de-
nes reliability as the probability that the software will not fail for a test case
randomly chosen in accordance with the operational prole.
One way of arriving at a more realistic operational prole is to combine ex-
pert opinions by the Delphi method [15]. Briey, Delphi starts with a moderator
sending out a problem description to a group of experts. The experts suggest
probabilities for operations, giving their reasons for this. The moderator collects
-
this information, collates it, and sends it out again. When the experts see the
reasoning for estimates dierent from their own, they tend to adjust their es-
timates. The objective is to arrive at a consensus after several iterations. The
early Delphi method took a very long time. The time scale has been compressed
by basing Delphi on modern distributed group decision systems (see, e.g., [16]).
Such systems allow the experts to interact asynchronously from dierent loca-
tions.
4 The SRE process
In Section 1 it was noted that the SRE process is to have ve steps. Step 1 was
discussed in Section 3, and Steps 2 and 3 are straightforward. Here our concern
will be Steps 4 and 5, but there is a question relating to software architecture
that we have to look at rst. In HRE very high reliability can be obtained by
placing identical components in parallel | if there are n such components and
n1 fail, one is still functioning. For n parallel components, each with reliability
r, the total reliability of the arrangement is
R = 1 (1 r)
n
:
If r = 0.9 and n = 3, then R = 0.999.
Reliability improvement of software by means of redundancy is problematic.
First, if we use identical implementations of the same operation, they will all
fail identically, something that is very unlikely to happen with hardware. The
solution is to create dierent implementations based on the same requirements.
A second problem relates to common cause failures. If all the components were
identical, all failures would be common cause. When components are not iden-
tical, we have n-version programming. The nature of failures in n-version pro-
gramming has been examined by numerous authors [17{21]. Multiple versions
have been found to give better reliability than a single \good" version [22, 23].
But with multiple versions their cost becomes a serious consideration | for an
investigation of this see [24].
Because of the cost, n-version programming is to be considered only for those
components of a mission-critical system whose failure would have serious con-
sequences. Suppose that each of the n components has been tested until its
reliability has become r. This has two components, r
s
, which relates to failures
specic to this version, and r
c
, which is the reliability with respect to common
cause failures. Then r = r
s
r
c
= r
1
r
, where the parameter (0 1)
indicates the strength of the common cause eect. When = 0, there are no
common cause failures; when = 1, all failures are common cause. The reliability
of the entire n-version critical component is
R = r
[1 (1 r
1
)
n
]:
Figure 1 shows the arrangement for n = 3.
-
rs
= r
1
r
s
= r
1
r
s
= r
1
r
c
= r
Fig. 1 | Parallel components and common cause failures
This analysis is based on the assumption that in case of a common cause
failure all n versions will fail rather than just n k of them. Justication for
the assumption comes from the observation that practically all common cause
failures in code written by experienced programmers familiar with the domain
of application are due to faulty requirements [18], which, of course, are the same
for all versions. A further assumption is that the coordination mechanism for
the dierent versions is fault-free, or that r
c
absorbs the eect of failures of this
mechanism. A major problem is that the critical components are likely to be
too small to give enough failures for a meaningful reliability estimation for the
replicated components. Techniques for dealing with situations in which testing
results in no failures can help in such a case [25].
A complete software process that includes SRE is Cleanroom [26, 27]. It
has three phases after software design has taken place. First, each design unit
is coded, and the coder demonstrates by means of an informal mathematical
proof that the code satises the requirements for this unit. There is no unit
testing at all. Proofs are not documented in order to keep down costs. Second,
the individual units are integrated. Third, an independent team tests the inte-
grated system using test cases selected on the basis of an operational prole,
and testing continues until a required value of MTTF has been reached. Under
this approach an appreciable percentage of statements in the software system
will never have been executed before release of the software, which has led to
criticism of Cleanroom [28]. However, it has been shown that the quality of
Cleanroom products is higher than the industry norm [29]. Still, it is dicult
to understand why unit testing is banned | intuitively one suspects that fewer
system test cases would be needed to reach the required MTTF if unit testing
had been performed. Therefore, unless this has already been done, it would be
instructive to analyze failure data to determine which of the failures would have
been avoided by unit testing, and on the basis of this establish whether unit
testing would have lowered total testing costs.
More serious than the construction of a truly representative operational pro-
le is the selection of the appropriate reliability growth model. Experience shows
that no model is appropriate for all software systems, even when the systems
-
are similar and have been developed under similar conditions. An exception is
provided by successive releases of the same system: the model that has worked
for the rst release works also for later releases [30]. Otherwise multiple models
have to be used. For a full explanation of the problem and of the multiple-model
approach see [31]. The multiple-model approach has been made fully practica-
ble by the very high speeds of modern computers. Moreover, the user of this
approach does not have to be a professional statistician. Various SRE tools are
available (see, e.g., Appendix A of [4]). A particularly interesting approach has
been taken in the design of the Sorel tool [32]. This tool has two modules: one
evaluates reliability on the basis of four reliability growth models, the other car-
ries out a trend analysis. The latter draws attention to unexpected problems,
such as an extended period during which reliability does not increase.
As discussed in [31], the main problem with reliability growth models is that
reliability estimates are extrapolations of past experience into the future. Such
extrapolations are not particularly safe even when the distribution of the data
on which they are based is fairly smooth: the condence range, i.e., the range
in which an estimated value is expected to lie with probability 0.95, say, gets
wider and wider the further out we extrapolate, until it covers the entire range
of possible values. This problem is made worse by the lack of smoothness in
software failure data. There is an upward trend in the time between failures
as testing progresses, but the pattern of the intervals between failures is very
irregular. This has a negative eect on condence intervals.
There have been claims that SRE can improve the accuracy of software pro-
cess cost and schedule estimates. Advance estimates of the time required for
testing, and of the expected number of failures coupled to estimates of the time
required for repairs would give fairly accurate estimates of both cost and du-
ration of the test phase. Unfortunately the extrapolation problem makes the
accuracy of early reliability prediction highly questionable. The use of Bayesian
techniques for early reliability estimation holds some promise. This will be dis-
cussed in Section 5.
Matters are quite dierent after the release of a software system, particularly
in the case of product software that is distributed as a large number of copies.
Operational use obviously follows a perfect operational prole, and, if there is no
repair following failures, reliability is constant, which means that no complicated
model is required for its estimation. It is commonpractice not to carry out repairs
after failures, but save up all repairs for the next \minor" release, i.e., a release
that has fault removal as its only purpose. The constant reliability value for this
release is calculated, and so forth.
For an operating system running 24 hours a day at approximately 5000 instal-
lations, a failure intensity of no more than 0.9 failures per 1000 years of operation
has been established [1, pp. 205{207]. Very high reliability has also been achieved
in consumer products [33]; this can be attributed to the use of read-only mem-
ory. For a single-copy system the determination of ultra-high reliability would
require an impossibly long testing phase [34]. It is therefore surprising that reg-
ulatory agencies impose such reliability requirements on safety-critical software
-
| requirements that cannot possibly be veried. Schneidewind [35] combines
risk analysis with SRE predictions to reduce the risk of failures of safety-critical
systems. When there exists a software system with very high reliability, and such
reliability is a major requirement, the best approach is not to replace the system
or to carry out major changes on it.
A question that now arises is what level of reliability is reasonable. This is a
social rather than a technical issue. It is customary to assume that the greatest
risk is threat to human life. This is so in principle, but cost and convenience are
also important. Otherwise automobiles, which constantly threaten human lives,
would have been banned by consumer protection agencies. It has been estimated
that the direct cost of the famous AT&T nine-hour breakdown was 60-70 million
dollars [36]. Human life is rarely given this high a valuation in legal settlements.
However, as pointed out in [37], a cost-benet ratio, if made public, is not to
outrage most members of the public. It is therefore important not to become
cynical in a discussion on how to reduce testing costs.
5 Additional topics
As noted above, a system tested according to the principles of SRE can con-
tain code that has never been executed. Mitchell and Zeil [38] introduce an ap-
proach that combines representative and directed testing. Representative testing
is based on on an operational prole, directed testing uses a variety of approaches
and criteria to uncover faults eciently. One popular criterion for directed test-
ing is statement coverage under which an attempt is made to execute as many
statements of a program as possible. Combination of the two modes of testing
is the subject matter of [39] as well. It should be noted that full 100% coverage
is impossible to achieve. This is so because of some theoretical undecidability
results, e.g., the non-existence of an algorithm for solving two non-linear inequal-
ities. Hence, it is not always possible to nd an input that will force control along
a particular path in the software system. Actual statement coverage in system
tests may be only 50{60%, although testers may guess it to be 90% or better [40].
If 100% statement coverage were possible, then, in principle, test cases could be
generated automatically to achieve such coverage. This is impossible for even
very simple cases. It should be realized that undecidability results indicate the
non-existence of general algorithms. In unit testing full coverage is for the most
part achievable by examination of the structure of the program unit.
An interesting development in SRE has been the use of Bayesian approaches
to reliability prediction. In most cases statistical estimation is based on obser-
vations alone | this is a frequentist approach. However, if prior knowledge and
beliefs are also taken into account in the estimation process, we have what is
known as a Bayesian or subjectivist approach. Earlier work on this approach in
SRE is surveyed in [4]. Some more recent research [41, 42] suggests that Bayesian
techniques can give reasonable estimates of the cost of testing by analysis of early
test data | see also [43, 44].
-
Nearly all the SRE literature deals with software composed of procedures
written in languages such as Fortran or Pascal. The SRE approach can be ap-
plied also to logic programs [45], and to AI software [46-48]. We have used SRE
techniques to estimate the level of completeness of data base schemas [49]. The
approach is as follows. A set of queries likely to be put to a data base under
construction is collected from prospective users, together with usage ratings. A
usage rating is an integer between 1 and 5 that indicates the relative frequency
with which a particular query will be put. A test suite is constructed in which
each query has as many copies as its usage rating, and this collection of queries
is randomized. The queries are put to the data base one after another, and if the
data base base schema has to be extended to deal with a query, this is registered
as a failure. The failure data are analyzed using SRE techniques, thus providing
a quantitative estimate of the completeness of the schema.
References
1. J.D. Musa, A. Iannino, and K. Okumoto, Software Reliability | Measurement,
Prediction, Application, McGraw-Hill, 1987.
2. J.D. Musa and W. Ehrlich, Advances is software reliability engineering, Advances
in Computers 42, 1996, 77{117.
3. J. Tian, Reliability measurement, analysis, and improvement for large systems,
Advances in Computers 46, 1998, 159{235.
4. M.R. Lyu (Ed), Handbook of Software Reliability Engineering, McGraw-Hill, 1996.
5. A. Bertolino and L. Strigini, On the use of testability measures for dependability
assessment, IEEE Trans. Software Eng. 22, 1996, 97{108.
6. W.E. Howden and Y. Huang, Software trustability analysis, ACM Trans. Software
Eng. Methodology 4, 1995, 36{64.
7. O. Berman and M. Cutler, Choosing an optimal set of libraries, IEEE Trans.
Reliability 45, 1996, 303{307.
8. D.R. Kuhn, Sources of failure in the public switched telephone network, Computer
30: 4, April 1997, 31{36.
9. J.D. Musa, Operational proles in software-reliability engineering, IEEE Software
10: 2, March 1993, 14{32.
10. A.T. Berztiss, Transactional computation, in Database and Expert Systems Ap-
plications (Lecture Notes in Computer Science 1677), 521{530, Springer, 1999.
11. G. Booch, J. Rumbaugh, and I. Jacobson, The Unied Modeling Language User
Guide, Addison-Wesley, 1999.
12. A. Avritzer and B. Larson, Load testing software using deterministic state testing,
in Proc. 1993 Internat. Symp. Software Testing and Analysis [published as Software
Engineering Notes 18: 3, July 1993)], 82{88.
13. A. Pasquini, A.N. Crespo, and P. Matrella, Sensitivity of reliability-growth models
to operational prole errors vs testing accuracy, IEEE Trans. Reliability 45, 1996,
531{540 (errata ibid 46, 1997, 68).
14. T. Adams, Total variance approach to software reliability estimation, IEEE Trans.
Software Eng. 22, 1996, 687{688.
15. O. Helmer, Social Technology, Basic Books, 1966.
16. D. Kenis and L. Verhaegen, The MacPolicy project: developing a group decision
support system based on the Delphi method, in Decision Support in Public Ad-
ministration, North-Holland, 1993, 159{170.
-
17. J.C. Knight and N.G. Leveson, An experimental evaluation of the assumption of
independence in multiversion programming, IEEE Trans. Software Eng. SE-12,
1986, 96-109.
18. P.G. Bishop, D.G. Esp, M. Barnes, P. Humphreys, G. Dahll, and J. Lahti, PODS
| a project on diverse software, IEEE Trans. Software Eng. SE-12, 1986, 929{940.
19. S.S. Brilliant, J.C. Knight, and N.G. Leveson, The consistent comparison problem
in N-version software, IEEE Trans. Software Eng. 15, 1989, 1481{1485.
20. S.S. Brilliant, J.C. Knight, and N.G. Leveson, Analysis of faults in an N-version
software experiment, IEEE Trans. Software Eng. 16, 1990, 238{247.
21. D.E. Eckhardt, A.K. Caglayan, J.C. Knight, L.D. Lee, D.F. McAllister, M.A. Vouk,
and J.P.J. Kelly, An experimental evaluation of software redundancy as a strategy
for improving reliability, IEEE Trans. Software Eng. 17, 1991, 692{702.
22. J.P.J. Kelly, T.I. McVittie, and W.I. Yamamoto, Implementing design diversity to
achieve fault tolerance, IEEE Software 8: 4, July 1991, 61{71.
23. L. Hatton, N-version design versus one good version, IEEE Software 14: 6, Nov/Dec
1997, 71{76.
24. R.K. Scott and D.F. McAllister, Cost modeling of N-version fault-tolerant software
systems for large N, IEEE Trans. Reliability 45, 1996, 297{302.
25. K.W. Miller, L.J. Morell, R.E. Noonan, S.K. Park, D.M. Nicol, B.W. Murrill, and
J.M. Voas, Estimating the probability of failure when testing reveals no failures,
IEEE Trans. Software Eng. 18, 1992, 33{42.
26. M. Dyer, The Cleanroom Approach to Quality Software Development, Wiley, 1992.
27. H.D. Mills, Zero defect software: Cleanroom engineering, Advances in Computers
36, 1993, 1{41.
28. B. Beizer, Cleanroom process model: a critical examination. IEEE Software 14: 2,
March/April 1997, 14{16.
29. R.C. Linger, Cleanroom software engineering for zero-defect software, in Proc. 15th
Int. Conf. Software Eng., pp. 2{13, 1993.
30. A. Wood, Predicting software reliability, Computer 29: 11, Nov 1996, 69{77.
31. S. Brocklehurst and B. Littlewood, Techniques for prediction analysis and recali-
bration, in [4], pp. 119{166.
32. K. Kanoun, M. Kaa^niche, and J.-C. Laprie, Qualitative and quantitative reliability
assessment, IEEE Software 14: 2, March/April 1997,
33. J. Rooijmans, H. Aerts, and M. v. Genuchten, Software quality in consumer elec-
tronics products, IEEE Software 13: 1, Jan 1996, 55{64.
34. R.W. Butler and G.B. Finelli, The infeasibility of experimental quantication of
life-critical software reliability, IEEE Trans. Software Eng. 19, 1993, 3{12.
35. N.F. Schneidewind, Reliability modeling for safety-critical software, IEEE Trans.
Reliability 46, 1997, 88{98.
36. T.C.K. Chou, Beyond fault tolerance, Computer 30: 4, April 1997, 47{49.
37. W.R. Collins, K.W. Miller, B.J. Spielman, and P. Wherry, How good is good
enough? An ethical analysis of software construction and use, Comm. ACM 37: 1,
Jan 1994, 81{91.
38. B. Mitchell and S.J. Zeil, A reliability model combining representative and directed
testing, in Proc. 18th Int. Conf. Software Eng., pp. 506{514, 1996.
39. J.R. Horgan and A.P. Mathur, Software testing and reliability, in [4], pp. 531{566.
40. P. Piwowarski, M. Ohba, and J. Caruso, Coverage measurement experience during
function test, in Proc. 15th Int. Conf. Software Eng., pp. 287{301, 1993.
41. J.M. Caruso and D.W. Desormeau, Integrating prior knowledge with a software
reliability growth model, in Proc. 13th Int. Conf. Software Eng., pp. 238{245, 1991.
-
42. C. Smidts, M. Stutzke, and R.W. Stoddard, Software reliability modeling: an ap-
proach to early reliability prediction, IEEE Trans. Reliability 47, 1998, 268{278.
43. S. Campodonico and N.D. Singpurwalla, A Bayesian analysis of the logarithmic-
Poisson execution time model based on expert opinion and failure data, IEEE
Trans. Software Eng. 20, 1994, 677{683.
44. M.-A. El Aroui and J.-L. Soler, A Bayes nonparametric framework for software-
reliability analysis, IEEE Trans. Reliability 45, 1996, 652{660.
45. A. Azem, Software Reliability Determination for Conventional and Logic Program-
ming, Walter de Gruyter, 1995.
46. F.B. Bastani, I.-R. Chen, and T.-W. Tsao, A software reliability model for articial
intelligence programs, Int. J. Software Eng. Knowledge Eng. 3, 1993, 99{114.
47. F.B. Bastani and I.-R. Chen, The reliability of embedded AI systems, IEEE Expert
8: 2, April 1993, 72{78.
48. I.-R. Chen, F.B. Bastani, and T.-W. Tsao, On the reliability of AI planning soft-
ware in real-time applications, IEEE Trans. Knowledge and Data Eng. 7, 1995,
4{13.
49. A.T. Berztiss and K.J. Matjasko, Queries and the incremental construction of
conceptual models, in Information Modelling and Knowledge Bases VI, pp. 175{
185, IOS Press, 1995.