Transcript
Page 1: [ACM Press the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing - Sofia, Bulgaria (2010.06.17-2010.06.18)] Proceedings

International Conference on Computer Systems and Technologies - CompSysTech’10

How do we collect data for software reliability estimation?

Aleksandar Dimov, Senthil Kumar Chandran and Sasikumar Punnekkat

Abstract: Together with the increasing influence of software systems in all aspects of everyday life

there is also a need to focus on their non-functional characteristics. Reliability is one important software quality characteristic, which is defined as the continuity of correct service. Reasoning and modelling are necessary in order to achieve desired levels of reliability both during design and usage of software systems. The usefulness of reliability models are dependent on the input data we provide to these models, which influences the accuracy of the estimations we perform. There exist different techniques for gathering data for software reliability estimation and the aim of this paper is to make a good overview of them. As software testing is the most widely applied and researched technique among them, we also briefly present the current state of the art in application of different testing methods for the collection of data to be used for reliability estimation.

Key words: Software reliability, Software testing

1. INTRODUCTION Non-functional requirements about software intensive systems are becoming more

and more important these days. Additionally, in past decade, there exist a continuing trend in increase of the size and complexity of such systems in all application areas and embedded systems domain is not an exception of this rule. In this respect it is crucial to provide methods for modeling and reasoning about non-functional requirements in order to be able to adequately design systems that satisfy them. Non-functional requirements are also referred as quality attributes or also quality characteristics and introduce some restrictions about how the system should be designed or executed. One such important software systems quality parameter is the dependability [1], which is defined as the ability of a computing system to deliver services that can justifiably be trusted. Dependability is characterized by a number of attributes that comprise reliability, availability, safety, confidentiality, integrity and maintainability. In this work, we are focusing the research attention on reliability. It is defined as the continuity of correct service, i.e. the belief that a software system will behave as expected over a given period of time and under given operational environment. Reliability is usually modeled as a stochastic value. Reliability may have a number of different measures, among them are the following:

• Probability of software system to behave as per specification over a given period of time, given that everything was OK in the beginning of that period

• Mean time between two subsequent system failures • Failure rate

Here, by the term model for software reliability estimation, we mean a formal approach, towards calculation of concrete value for a reliability measure, given some amount of data. The other side of the coin is represented by the nature of this data and the way it has been collected. Currently there exist a lot of models for estimation of software reliability and usually most of them are based on some statistical processing of a system failure dataset [6]. However, there exist neither a commonly established reliability estimation model nor a method for collection of failure data. Moreover, it has been shown that results of reliability models are sensitive to different factors of testing method, such as test coverage and time between failures [2]. These factors may introduce uncertainty in reliability estimations, which may compromise application of reliability models. In these

Permission to make digital or hard copies of all or part of this work for personal or classroom use is grantedwithout fee provided that copies are not made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

CompSysTech'10, June 17–18, 2010, Sofia, Bulgaria.Copyright©2010 ACM 978-1-4503-0243-2/10/06...$10.00.

155

Page 2: [ACM Press the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing - Sofia, Bulgaria (2010.06.17-2010.06.18)] Proceedings

International Conference on Computer Systems and Technologies - CompSysTech’10

conditions there is a need to systematize different approaches to collection of data aimed at reliability estimation and further evaluate what uncertainty they introduce in software reliability values. This paper makes the first of these two steps – a study of different testing methods applied in gathering data for reliability estimation.

The structure of the paper is as follows: Section 2 presents a brief overview of different approaches towards estimation of software reliability; Section 3 surveys different testing methods for software reliability estimation; and finally Section 4 concludes the paper.

2. METHODS FOR COLLECTION OF FAILURE DATA In this section we are going to describe where traditionally testing fits into the whole

framework of software reliability estimation. Currently the most common approaches followed for identification of reliability parameters are:

• Software testing • Simulation • Users’ feedback

Software testing [14, 17] is a critical element of software quality assurance and represents review of specification, design and coding. Appropriate models, applied on testing data help to identify the failure density of software. The so-called black-box reliability models [6] are commonly used when test data or past failure information are available. Black-box reliability models are often referred as reliability-growth models. Reliability growth models assume extensive testing of the software system and observation of failures and the time that have passed between two subsequent failures. When a failure is detected, the fault that caused it is removed and the process continues with the assumption that correction of the fault did not introduce new errors into the code.

Simulation [7] approach to reliability estimation of a software system takes into account that it does not depend only on the structure of the software but also on the runtime information such as frequency of component reuse, execution time spent, interactions between the components, etc. The design requirement and code of the application are reviewed for the software structure but the simulation of the executions of software provides an indication of its runtime performance. During the simulation process, when a fault is identified, the fault is corrected and the components experience the reliability growth. It should be noted that faults are discovered only if that portion of the software where the fault lies is executed. If there is a fault that prevents the execution of some portion of the software until it is removed, the faults downstream are not identified. A limitation of this approach is the inability to use a time dependent rate for event occurrences.

Users’ feedback is a technique to get information about software reliability parameters of a system, by gathering data, after it has been shipped to the market and during its real usage [13]. In that case the idea for reliability prediction is similar to testing approach, except that it considers failure data based on real usage of the system. This way, failures are caused by unpredicted user behavior rather than on preliminary planned testing or intentionally introduced by fault injection. Data about system failures is gathered by bug reports submitted by users. Bug reports may be classified according to specific levels of severity. Then, reliability of software systems may be calculated as a frequency of failures (system crashes, interruptions, etc.) occurring in a system. Authors of [13] take into account that not all system failures are independent in time and may occur in groups of failures. The notion of Annual Rate of Events (i.e. failures) is introduced as a measure for software reliability. One limitation of this approach is that it requires a completely developed and functioning system and therefore, is not applicable in early stages of software development process.

156

Page 3: [ACM Press the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing - Sofia, Bulgaria (2010.06.17-2010.06.18)] Proceedings

International Conference on Computer Systems and Technologies - CompSysTech’10

Of all methods, listed above, testing is probably the most applied one, as simulation and users’ feedback are usable only in particular occasions, where specific conditions are available (simulators or ready systems).

3. SOFTWARE TESTING METHODS FOR RELIABILITY ESTIMATION Software testing is an enormous research area and there exist a lot of successful

works in it. This way, the goal of this paper is to survey the current state of the art in software testing only aimed at collection of failure data intended for software reliability estimation. First we introduce the notion of testing from both academic and industrial point1 of view and then survey its application in reliability estimation approaches.

3.1. TESTING AND RELIABILITY

Usually from an industrial point of view testing is classified according to the first two phases in testing process: modeling of the software’s environment and test selection [17]. According to software design phase it could be unit, integration and system testing. According to the test case selection criteria, industry software engineering community differentiates between functional (also called black-box) and structural (also called white box) testing. In functional testing test cases and test scenarios are selected according to the functionality of the software system, while in structural system, the organization of the source code is also taken into account.

However, as mentioned in the introduction, software testing is just one activity in application of software reliability model. This way, the terms white-box and black-box testing should not be confused with the notions of white and black box software reliability models [6, 8]. Black-box reliability models regard the software system as monolithic whole, while white box models study the reliability of software systems with respect to the composition of system modules and their architectural configuration. Note that this is only a high level abstract view of system design and does not take into account source code of modules, which is the case with white box testing. In general, testing could be envisaged as one step in application of reliability models, with the difference that black-box models rely only on information of collected during testing process (no matter if it is white or black-box testing), while white-box models also take into account the architecture of software system.

On the contrary from academic point of view there exist a plethora of testing approaches, including functional, regression, integration, product, unit, coverage and user-oriented, mutation embedded testing, etc. Some of these techniques are very similar or the same with each other or with what industry considers as testing. The goal of the next subsection is to give a good overview of this current state of the art in application of software testing approaches, aimed at reliability estimation.

3.2. SPECIFIC APPLICATION OF DIFFERENT TESTING METHODS FOR

RELIABILITY ESTIMATION

According to the definition of software reliability, given in the introduction of this paper, reliability is guaranteed only under an exact operational environment of the software system. This way the notion of the so-called operational profile is of paramount importance. Operational profile is a frequency distribution that gives the relative probability that a specific function of the program will be executed.

With respect to reliability estimation, similarly to what has been said in previous section we distinguish two broad categories of testing approaches: black-box and white

1 Terms of academic and industrial testing should not be taken as absolute notions. We use them in this paper only to differentiate between most common approaches in software engineering practice and what has been done by researchers in the area.

157

Page 4: [ACM Press the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing - Sofia, Bulgaria (2010.06.17-2010.06.18)] Proceedings

International Conference on Computer Systems and Technologies - CompSysTech’10

box testing. White-box testing may be also referred as directed [12] and also knowledge-based testing and encompass a wide variety of different methods, including decision coverage testing, data-flow coverage testing, mutation testing, boundary, special values-testing, representative testing, etc. In contrast, black-box testing for reliability is mainly covered by random testing.

Boundary and special-values testing [3] are basically variations of functional testing, where some knowledge is employed about the application domain, or the restrictions of the software system. In this kind of testing the test cases are selected in order to inspect the behaviour of the program at those cases where faults are likely to occur. For example these may be the boundary values of primitive data types (or user specified ranges) and also division and multiplication by zero in the case of special-values testing. Although relatively easy to apply, boundary and special values testing are not so significant for reliability estimation, as they provide only limited amount of data.

Decision coverage testing [3] is a method, where test cases are selected in a way that each condition operator within the source code is covered at least once. Data flow testing [3, 11] leads testers to select test-cases in such a way that all paths into the code, where a variable is first defined and then used within a statement, are covered. If that statement is a computational expression, than it is called c-use and if it is a predicate, it is called p-use. This way, c- and p-use testing are two variants of data-flow testing that are also possible ways towards reasoning about software reliability.

Another variation of the so-called coverage testing methods is the block testing [4]. A sequence of program statements is called a block and the goal of block testing is to select test cases in such a manner that 100% block coverage is achieved. The group of coverage testing methods may provide much bigger amount of data but is a bit more difficult to apply in real practice than the boundary testing methods.

Probably the two contrary and most often compared in reliability community methods for software testing are partition and random testing [4, 15]. Partition testing is a method, where the input domain of a software system is partitioned into several sub-domains, and then test cases are picked out from these sub-domains. On the contrary, random testing may be considered as a statistical sampling experiment, where test-cases are randomly selected from the whole input domain of the system. Both partition and random testing may provide relatively large amount of data, however they are really difficult to apply. The reasons are that division of the input domain of the software into disjoint sub-domains is not always possible and selection of random and non-correlated test cases is not a trivial task when we want to ensure high levels of reliability.

The notion of representative testing is introduced in [12] and is based on known operational profile of the system in use and selection of test cases, based on that knowledge. As seen from the definition this method is basically the same as partition testing.

Mutation testing [18] is one of the oldest methods to employ the so called fault injection [5] for testing of software systems. Simulated faults called mutants are injected into the system, according to some predefined rules. The hypothesis is that the test cases that detect mutants are more likely to detect the real faults. A mutant is said to be killed when the test case reveals the fault. By executing the test cases selected at random, with the original and the mutant code, a reliability growth data can be obtained that can provide an estimate for reliability. Practically mutation testing is easy to apply with appropriate tool support and may provide large data sets for reliability estimation. Experimental results have shown that it provides good way to select test cases for random testing of a program [3].

Research in the area [3, 4, 10], reveals that there is significant uncertainty in reliability values obtained when only one testing method is applied. This uncertainty is caused by the so-called saturation, which effects in reliability overestimation, because of application

158

Page 5: [ACM Press the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing - Sofia, Bulgaria (2010.06.17-2010.06.18)] Proceedings

International Conference on Computer Systems and Technologies - CompSysTech’10

of the trend to reach a limit in the ability of a certain testing method to reveal faults in a given program. To overcome this, more than one testing method should be used to gather data for reliability estimation, and also some information of coverage should be used when estimating software reliability.

It has been also shown that random testing gives better results than many white-box approaches [4, 9].

A brief comparison of these testing techniques is shown in table 1. An extensive quantitative measurement case-study that will evaluate different methods presented above is not shown here, due to space limits. However, our research has shown that using mutation testing one may get very similar results to more powerful random testing with less efforts.

Table 1: Comparison of testing methods with

respect to their application in reliability estimation

White/Black-box approach

Amount of data provided

Application difficulty

Boundary values testing

White box Small Low

Special values testing White box Small Low Decision coverage testing

White box Medium Medium

Data-flow testing White box Medium Medium Partition testing White box Medium/Large High Random testing Black box Large High Representative testing

White box Medium High

Mutation testing White box Large Easy 4. CONCLUSIONS AND FUTURE WORK In this paper we systematize the current state of the art in gathering data for reliability

estimation. There exist three main groups of methods – testing, simulation and users feedback. Testing is the most scrutinized approach in different research works and this article makes a further study of different testing methods, aimed at estimation of reliability.

Survey has shown that mutation and random testing appear to be attractive approaches to be applied in reliability estimation. In this respect, our further research work targets at an empirical validation of the feasibility of mutation testing in evaluation of uncertainty in software reliability estimates.

ACKNOWLEDGEMENT The work presented in this paper was partially supported by grants from the National

Science Fund, part of the Ministry of Education and Science in Bulgaria under the PD01-0106 (ARECS) and MU-01-143 (ADEESS) projects; the EURECA Project (funded by the European Commission under the Erasmus Mundus External Cooperation Window); and the Progress Project at Mälardalen University, Sweden. Acknowledgments are also due to AERB-Safety Research Institute, India (the parental organization of second author).

REFERENCES [1] Avižienis, A., Laprie, J-C., Randell, B., Basic concepts and Taxonomy of

dependable and secure computing, IEEE Trans on Dependable and Secure computing, Vol. 1, Issue 1, Jan -March 2004.

159

Page 6: [ACM Press the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing - Sofia, Bulgaria (2010.06.17-2010.06.18)] Proceedings

International Conference on Computer Systems and Technologies - CompSysTech’10

[2] Chandran, S. K., A. Dimov and S. Punnekkat, Modeling uncertainties in the estimation of software reliability – a pragmatic approach, accepted for publication in Proceedings of the Fourth IEEE International Conference on Secure Software Integration and Reliability Improvement (SSIRI 2010), Singapore, June 2010, to appear.

[3] Chen, M. et al., A Time/Structure Based Model for Estimating Software Reliability, Purdue University Technical Report, SERC-TR-117-P, December, 1992.

[4] Chen, M., A. Mathur and V. Rego, Effect of testing techniques on software reliability estimates obtained using a time-domain models, in IEEE Transactions on Reliability. Vol. 44, no. 1, Mar. 1995, pp. 97-103.

[5] Clark, J., Pradhan, D., Fault injection, Computer, vol.28, no.6, pp.47-56, Jun 1995

[6] Farr, W., Software reliability modeling survey, in: M.R. Lyu (Ed.), Handbook of Software Reliability Engineering, McGraw-Hill, New York, 1996, pp. 71–117.

[7] Gokhale, S. and M. Lyu, A simulation approach to structure based software reliability analysis, IEEE Trans. On Software Engineering, 31(8), Aug.2005.

[8] Goseva-Popstojanova, K., Trivedi, K.S.: Architecture Based Approach to Reliability As-sessment of Software Systems, Performance Evaluation, Vol.45/2-3, June 2001

[9] Hamlet, D., and R. Taylor, Partition Testing Does Not Inspire Confidence, IEEE Transactions on Software Engineering, vol. 16, no. 12, Dec. 1990, pp. 1402-1411.

[10] Horgan, J. and A. Mathur, Software Testing and Reliability, In: M.R. Lyu (Ed.), Handbook of \Software Reliability Engineering, McGraw-Hill, New York, 1996, pp. 531-566

[11] Horgan, J., and S. London, "Dataflow Coverage and the C Language," Proceedings of the Fourth Annual Symposium on Testing, Analysis, and Verification, Victoria, British Columbia, Canada, October 1991, pp. 87-97.

[12] Mitchell, B. and S. Zeil, A reliability model combining representative and directed testing, In Proceedings of the 18th international Conference on Software Engineering, Berlin, Germany, March 1996), pp. 506-514.

[13] Murphy, B and Gent, T., Measuring system and software reliability using an automated data collection process, In Quality and Reliability Engineering International, 11(5), 341-353, 1995.

[14] Myers G., et all, The art of software testing, John Wiley & Sons, New York, 2004

[15] Ntafos, S., On Comparisons of Random, Partition, and Proportional Partition Testing, In IEEE Transactions of Software Engineering vol.27(10), October 2001, pp. 949-960.

[16] Whittaker, J., What Is Software Testing? And Why Is It So Hard?, In IEEE Software, vol. 17(1), January/February 2000, pp. 70-79.

[17] Whittaker J. and Jeffrey Voas, Toward a more reliable theory of software reliability. Computer, 33(12):36–42, 2000.

[18] Jia, Y. and M. Harman, An analysis and survey of the development of mutation testing. Technical report, CREST Centre, King's College London, Technical Report, 2009.

ABOUT THE AUTHORS Aleksandar Dimov, PhD is an assistant professor at the Department of Software

Technologies, Faculty of Mathematics and Informatics, Sofia University “St. Kliment Ohridski”, Phone: +359 2 971 35 09, e-mail: [email protected]

Senthil Kumar Chandran, PhD is a post-doc researcher at the School of IDE, Mälardalen University Västerås, Sweden, e-mail: [email protected]

Sasikumar Punnekkat, PhD is a professor at School of IDE, Mälardalen University, Västerås, Sweden, e-mail:[email protected]

160


Top Related