distribution-based statistical sampling: an approach to software functional test

8
J. SYSTEMS SOFTWARE 107 1993; 20:107-I 14 Distribution-Based Statistical Sampling: An Approach to Software Functional Test Software testing uses both structural and functional strategies to ensure the delivery of quality products. Structural testing checks the implementation of the software design, whereas functional testing validates the correct implementation of the specified software requirements. Exhaustive functional testing for soft- ware of any reasonable complexity is recognized as being impractical; therefore, sampling strategies must be introduced to prioritize and optimize the require- ments validation. Recent work with statistically based sampling techniques shows a potential for the system- atic validation of requirements with the use of repre- sentative inputs from the software’s planned operat- ing environments. This article discusses the need for sampling strategies in functional testing, the charac- teristics of and experience with a statistical sampling approach, and the impact of a statistical approach on the functional test process. INTRODUCTION Testing is performed at several points in the soft- ware development life cycle by software developers, software testers, and software customers. A hierar- chy of tests is common practice; these provide op- portunities to remove software errors and confirm that the software correctly performs its stated objec- tives. Structural testing [l] is used primarily by the software developers to verify that the implemented code matches its design. It is typically performed in the three steps of unit, string, and integration test- ing. Functional testing 121 is used primarily by testers and customers to validate that the software design and implementation deliver the required capability. It is typically performed in some number of qualifi- cation and acceptance tests. The focus of this article is software functional Address correspondence to Michael her, IBM Federal Systems Company, 9211 Corporate Boulevard, Rock&e, Ma~land 20850. testing and, in particular, the sampling strategies that must be introduced to create test materials. Functional testing methods treat the software as a black box and are primarily concerned with the software externals (i.e., checking the correct map- ping of software inputs to expected outputs). The internal structure of the software ~ically is not considered since that should have been the focus of the developer-conducted structural tests. The re- quirements addressed through functional testing are the functional capability of the software, its perfor- mance, reliability, installability, and various other considerations. Essentially any requirement identi- fied in a specification and of interest to the customer is considered a candidate for a functional test. A functional test is generally organized to validate the implementation of the requirements defined within a specific software or system specification, where the same specification was used to design and implement the software. There should be as many functional tests as there are specifications defining the software product and the system in which the software is embedded. The number and hierarchy of specifications for a given project depend on the size and complexity of the particular development. Functional testing is generally performed in a systematic process of identifying the requirements to be validated, creating the necessary test materials, executing the software against selected test materi- als, and reporting the results of the tests. A test plan is created to address a specific software specification and identify which requirements within the specifi- cation are to be validated and by what validation method (inspection, analysis, demonstration, or test). The plan also identifies the schedule, resource, tool, and support requirements for a particular functional test. The details of the test plan are worked out with test procedures (cases), which identify the steps to be followed in the test, the discrete data values to be Q Elscvier Science Publishing Co., Inc. 655 Avenue of the Americas, New York, NY 10010

Upload: michael-dyer

Post on 21-Jun-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

J. SYSTEMS SOFTWARE 107 1993; 20:107-I 14

Distribution-Based Statistical Sampling: An Approach to Software Functional Test

Software testing uses both structural and functional strategies to ensure the delivery of quality products. Structural testing checks the implementation of the software design, whereas functional testing validates the correct implementation of the specified software requirements. Exhaustive functional testing for soft- ware of any reasonable complexity is recognized as being impractical; therefore, sampling strategies must be introduced to prioritize and optimize the require- ments validation. Recent work with statistically based sampling techniques shows a potential for the system- atic validation of requirements with the use of repre- sentative inputs from the software’s planned operat- ing environments. This article discusses the need for sampling strategies in functional testing, the charac- teristics of and experience with a statistical sampling approach, and the impact of a statistical approach on the functional test process.

INTRODUCTION

Testing is performed at several points in the soft- ware development life cycle by software developers, software testers, and software customers. A hierar- chy of tests is common practice; these provide op- portunities to remove software errors and confirm that the software correctly performs its stated objec- tives. Structural testing [l] is used primarily by the software developers to verify that the implemented code matches its design. It is typically performed in the three steps of unit, string, and integration test- ing. Functional testing 121 is used primarily by testers and customers to validate that the software design and implementation deliver the required capability. It is typically performed in some number of qualifi- cation and acceptance tests.

The focus of this article is software functional

Address correspondence to Michael her, IBM Federal Systems Company, 9211 Corporate Boulevard, Rock&e, Ma~land 20850.

testing and, in particular, the sampling strategies that must be introduced to create test materials. Functional testing methods treat the software as a black box and are primarily concerned with the software externals (i.e., checking the correct map- ping of software inputs to expected outputs). The internal structure of the software ~ically is not considered since that should have been the focus of the developer-conducted structural tests. The re- quirements addressed through functional testing are the functional capability of the software, its perfor- mance, reliability, installability, and various other considerations. Essentially any requirement identi- fied in a specification and of interest to the customer is considered a candidate for a functional test.

A functional test is generally organized to validate the implementation of the requirements defined within a specific software or system specification, where the same specification was used to design and implement the software. There should be as many functional tests as there are specifications defining the software product and the system in which the software is embedded. The number and hierarchy of specifications for a given project depend on the size and complexity of the particular development.

Functional testing is generally performed in a systematic process of identifying the requirements to be validated, creating the necessary test materials, executing the software against selected test materi- als, and reporting the results of the tests. A test plan is created to address a specific software specification and identify which requirements within the specifi- cation are to be validated and by what validation method (inspection, analysis, demonstration, or test). The plan also identifies the schedule, resource, tool, and support requirements for a particular functional test. The details of the test plan are worked out with test procedures (cases), which identify the steps to be followed in the test, the discrete data values to be

Q Elscvier Science Publishing Co., Inc. 655 Avenue of the Americas, New York, NY 10010

108 J. SYSTEMS SOFTWARE 1993; 20:107-114

M. Dyer

used, the expected output values from the software execution, and the pass-fail criteria to be used in deciding whether the software execution was correct. These test procedures are run against the software in a prede~ned environment. The procedure execu- tions are recorded and results summarized to certify the completeness and thoroughness of the require- ments validation.

REQUIREMENTS COVERAGE VS. TESTING COST

For software functional testing, it is impractical to attempt to validate every specified requirement un- der every possible operational condition and combi- nation of input data. The effort and time required to define, execute, and analyze the results from that number of tests is prohibitive in most software de- velopments. There are a few exceptions to that rule, such as the testing of language compilers [3], where the closed syntax and semantics allow the automatic generation and execution of extremely large num- bers of self-checking tests. But these are the excep- tions; the more common question facing the tester is how to best select or sample the requirements, input data values, and operating conditions of some soft- ware package to validate its implementation with some level of confidence.

PROBLEM WITH NUMBERS OF REQUIREMENTS

The requirements with which the tester must deal come initially form a customer in some form of a statement of work, which typically identifies the functional, performance, and operational capabili- ties that the customer wants in the delivered soft- ware. In a contractor’s responding proposal, a re- finement or elaboration is performed to document the contractor’s interpretation of those require- ments. Since proposal preparation time is generally limited, the total set of requirements is typically not completed until after the work is awarded to the contractor. When more opportunity for require- ments elaboration is afforded before contact award, there is less customer and contractor risk for ensur- ing that a complete set of requirements has been defined for development. This is the rationale be- hind the phased or multi-step approach to contract- ing, where the initial step is devoted to elaborating a complete set of requirements before starting any development.

To illustrate a typical buildup of requirements, we use data from an aircraft system development 141. For the aircraft, the customer identified the require-

ment for an avionics subsystem to provide the air- craft’s electronics capability, which would be com- posed of software and hardware components. The flight software, identified as one software compo- nent, was to provide the interface between the air- craft systems and the flight crew. Within the flight software, requirements were identified for a naviga- tion subcomponent to control the aircraft’s flight path.

This is a reasonable and fairly common level of detail (subcomponent within a component of a sub- system within a system) used by a customer to spec- ify requirements. At this clip level, the accumulation of requirements for a total system is generally num- bered in the several hundreds or even thousands. Table 1 shows that 16 discrete requirements were identified for the example navigation subcomponent within the flight software component. In this particu- lar case, some 150 requirements were identified for the total flight software, which had a dozen other subcomponents in addition to navigation. A need for the customer to identify l,OOO-2,000 requirements for all the components in the avionics subsystem and for all the subsystems within the aircraft would not be unexpected. In the contractor’s response to this proposal, a further refinement of the navigation subcom~nent requirements was made. As shown in Table 2, the original 16 requirements (entries in bold text) grew to 250 requirements. These derived requirements were documented in a combination of the contractor’s technical proposal, the require- ments specification for the aircraft system and the software requirements specification for the flight software.

A second refinement of the navigation subcompo- nent requirements occurred during the finalization of the software requirements specification for the flight software. Additional requirements were de- rived to clarify ambiguities uncovered during high- level software design; as shown in Table 3, this resulted in a doubling of the 250 requirements (en- tries in bold text) to some 476 requirements. The principal cause for this growth was the further defi-

Table 1. Requirements from Statement of Work

Specification Paragraph

Paragraph Content Re uirement % ount

3.2.6 Navigation 3.2.6.1 Navigation tnsertion 3.2.6.2 Sensor Alignment 3.2.6.3 Mode Control

3.2.6.4 Air Computattons 3.2 6.5 Navigation Support 3.2 6.6 Position Keeping

Statistical Sampling

Table 2. Requirements from Technical Proposal

Specification Paragraph Content R uirement Paragraph “t: aunt

3.2,6 3.2.6.1 3.2.3.1.1 3.2.6.1 2 3.2.6.1.3 3.2.6.1.4 3.2.6.2 3.2.6.2. I 3.2.6.2.2 3.2.6.2.3 3.2.6.2.4 3.2.6.2.5 3.2.6.2 6 3.2.6.2 7 3.2.6.2.8 3.2.6.3 3.2.6.3 1 3.2.6.3.2 3.2.6 3 3 3.2.6.3.4 3.2.6.4 3.2,6.4.1 3 2.6 4.2 3.2.6.5 3.2 6 5.t 3 2.6.5.2 3 2.6.5.3 3.2 6.54 3.2.6.5.5 3.2.6.6 3.2 6 6.1 3 2.6.6 2 3.2 6 6.3 32664 32665 3 2 6.6 6 3.2 6.6.7 3.2.6.6.8 3.2.6.6 9 3.2 6 6 10 3.2 6 6.1 1

Navigation Navigation Insertion Present Position Insertion Ground Elevation Barometric Pressure Data Override Insertion Sensor Alignment Ground Alignment Control Air Alignment Altgnment Status Naw ation Initialization INS 8 ensor Mode AHRS Sensor Mode AHRS Flux Valve Doppler Land/Sea Mode Control Data Reasonableness Mode Availability Mode Recommendation Mode Activation Air Computations True Air Speed Wind Computation Navigation Support Earth Radii Doppler/INS DoppieriAHRS Altitude Barometric Calcuiatlon Position Keeping lnsertion,M~e,Update Miscellaneous Rates Air Alignment and Doppler

~~~~~~~,~~~~~at,on INS Correction Inert&al Navi Doppler/AH w

atto; S or AHRS

Atr Data OopplerilNS Headrng Other Parameters

4

:t

190 14 2

:, 1

l?3

: 6 2 15 6

247

: 9

:

:

;: 4 1

10

tY3

:

5 2 1 3

250

nition in the sensor alignment, navigation initializa- tion, and position-keeping parameter areas.

PROBLEM WITH COMPREHENSIVE REQUIREMENTS VALIDATION

Test preparation is further aggravated when the effort involved in performing a single test is consid- ered, since the accumulated time generally averages some number of hours per test procedure. It is not that the tests will take some average number of hours to execute, since the execution is generally performed in seconds or fractions of seconds. Rather, it is that the accumulated effort required to design and organize a test procedure, to execute that proce- dure one or more times, and to analyze the execu- tion results against pass/fail criteria generally aver- ages some number of hours per procedure. Depend- ing on the complexity of the particular requirement being validated, the time may be less than the aver- age (e.g., navigating through a display panel) or significantly greater than the average (e.g., comput- ing a satellite orbit). For functional testing, viewing test procedure effort in terms of an average number of hours per procedure is a useful planning rule both for commercial and public sector software de- velopments, where the complexity of the application will dictate whether the average is one or four hours. Note that the same rule is not applicable to the planning of software structural testing, where infor- mality in set-up and reporting and allowed software modification for data insertion and recording gener- ally produce averages in the minutes per test.

To validate a specified requirement, a test procedure In planning the functional testing of software with is typically defined to exercise the software that requirements similar to the navigation software implements that requirement. For each procedure, shown in Table 3, the effort for just a single test of the steps to be followed, the problem inputs to be each requirement might account for some 250-500 used, and the criteria for judging success or failure hours (roughly 6-12 weeks) of effort. This would are defined. Because of the range of input data assume 30 minutes to one hour per test procedure values from which to choose, the identification of on average, which might be the most optimistic case representative and comprehensive problem inputs is -not an attractive situation when planning the a challenge for the tester. Understanding which functional testing of a system in which the naviga- problem inputs (operator, hardware, other software) tion software is a very small subcompon~nt. This trigger the execution of the software that imple- situation is further aggravated when one considers ments a requirement is the first step. Understanding the general rule of thumb for reasonably validating the data domains from which discrete values for software requirements, which calls for a few nominal those inputs can be selected is a second and more test executions, executions with boundary values,

J. SYSTEMS SOfTWARE 109 1993; 20:107-114

difficult step. The input analysis must distinguish between legal and illegal data values, identify sub- sets of most likely data values, and sort out relation- ships and dependencies between inputs and data values.

The goal of the input analysis is to determine the variation needed in input selections and their dis- crete data values to ensure adequate requirements testing and gain confidence in the requirements vali- dation. In many cases, the complex data relation- ships within and between inputs and the size of the value domains quickly introduce combinatorial prob- lems in test data selection.

110 3. SYSTEMS SOFTWARE 1993; 20:107-114

Table 3. Requirements from Software Requirement

Specification

Specification Paragraph

Paragraph Content Req;~u~ent

3.2.6 3.2.6.1 3.2.3.1.1 3.2.6.1.2 ;;t:.i

3:2:6:2’ 3.2.6.2.1 3.2.6.2.1 .l 3.2 6.2.1.2 3.2.6.2.1.3 3.2.6.2.1.4 3.2 6.2.1.5 ;.;.f;.;.”

3:2:6:2:3 3.2.6.2.4 3.2.6.2.4.1 3.2.6.2.4.2 ;.;t$.;

3:2:6:2:7 3.2.6.2.7.1 3.2.6.2.7.2 3.2.6.2.7.3

it*%.” 3:2:6:3.1 3.2.6.3.2 3.2.6.3.3

Z-Xt” 3:2:6:4.1 3.2.6.4.2 3.2.6.5 3.2.6.5.1 3.2.6.5.2 3.2.6.5.3 3.2.6.5.4 3.2.6.5.5 3.2.6.6 3.2.6.6.1 3.2.6.6.1.1 ;.;,2.;.;.2

3:2:6:6:3 3.2.6.6.4 3.2.6.6.5 3.2.6.6.6 ;:;:8’

3:2:6:6:9 3.2.6.6.10 3.2.6.6.11 3.26.6.11.1 3.2 6.6 11 2 326.6.11.3 3266.1i.4 3.2.6 6.11.5 3266.11 6 326.6.11.7 3.266 11 6 3266 11.9

Navigation Navigation Insertion Present Position Insertion Ground Elevation Barometric Pressure Data Override Insertion Sensor Alignment Ground Alignment Control Ground Alignment Insertion INS Test Alignment INS Normal Alignment INS Fast Alignment AHRS Normal Altgnment AHRS Fast Alignment Air Alignment Alignment Status Navigation initialization INS htialization AHRS Initialization INS Sensor Mode AHRS Sensor Mode AHRS Flux Valve Flux Valve Display Flux Valve Calculation Check Flux Valve Calculation Doppler Land/Sea Mode Control Data Reasonableness Mode Availability Mode Recommendation Mode Activation Air Computations True Air Speed Wind Computation Navigation Support Earth Radii Doppler/INS DoppleriAHRS Altitude Barometric Calculation Position Keeping Insertion,Mode,Update insertion and Mode

Z&E%%%G Rates Air Ali In-Fltg .a

nment and Doppler t Alignment

Doppler/INS Navigation INS Correction Inertial Navi ation Dop ler/AH

Lp R S or AHRS

Air eta Doppler/INS Heading Other Parameters True Heading Magnetic Variance Ground Track Angle of Attack System Rates Turn Rate Body Accelerabons Frame Accelerabons Height Above Ground

4 2

11 9

:: 2

:4

1: 20

: 33 9

:6 14 8 13

T 33

‘6’ 2

15

: 27

: 9

:

:

t 4

: 6

10 9 13 6

ii 3 2

:

:

:

$

3’ 1

476

and executions that stress or stretch the implemen- tation of a requirement. Clearly, defining an addi- tional three or four tests for each requirement of the navigation software (Table 3) would stretch this functional testing effort beyond the practical range.

M. Dyer

This simple case illustrates the dilemma facing the software functional tester who feels obligated to validate all requirements with some representative set of tests for each requirement. Clearly, there is a need to select or sample from the total set of soft- ware requirements and from the associated input values, a subset for which testing will give sufficient confidence in the validation process. Different strategies have been defined and used in software functional testing that attempt to optimize the re- quirements and data selection to maximize the re- quirements validation.

SAMPLING STRATEGIES FOR FUNCTIONAL TESTING

Few techniques are reported in the technical litera- ture for creating test samples for functional testing. Those that are reported rely on testing with subsets of the requirements and input values to achieve requirements validation. The approach used for se- lecting what goes into the subset distinguishes one method from another. Some of these techniques are highlighted in the following paragraphs with refer- ences to original sources, though surveys of testing methods 1.5, 61 provide more numerous and recent references to work in this area.

The equivalence partitioning method [7] attempts to subdivide the domain of input values into classes for which equivalent results could be expected from the execution of any representative value in the class. The approach subdivides a total input domain into a finite number of subsets and then relies on sampling from the subsets to restrict the functional test input. The boundary value analysis method [7] is an extension of partitioning which, rather than se- lecting any value as representative, focuses on se- lecting values along the edges or boundaries of the classes. These boundary values are viewed as trig- gers for the more error-prone areas of the software. The category partition method [8] is another parti- tioning technique which uses an analysis of formal software specifications and automated tools to orga- nize a minimum set of tests to provide full coverage of the specification.

As the name implies, the cause-effect graphing method [9] traces relationships between the differ- ent software inputs to systematically identify the input combinations needed for all or maximum re- quirements coverage. The adaptive perturbation technique [lo] has a different focus and uses exe- cutable assertions embedded in the software to iden- tify the reasonable ranges for input/output data values. The method attempts to maximize the effec-

Statistical Sampling .I. SYSTEMS SOFTWARE 111 1993: 20:107-l I4

tiveness of a test suite in detecting assertion viola- tions. The error-guessing method [7] relies on tester experience in isolating areas with more potential for causing software failures. Similarities with previous software developments, the functional complexity of the software, and the first-time implementation of software functions are clues used in this method to isolate problem areas.

Statistical techniques (11, 121 have been discussed for software testing but generally focus on structural testing. The thrust of these techniques is on generat- ing test data by the random sampling of a uniformly distributed set of input values. In the reported expe- rience, random test case generation has been favor- ably compared with the branch and path testing methods of structural testing in terms of its coverage of program structure. Statistical techniques [13, 141 have also been introduced as functional testing methods and are the focus of this article. These techniques rely on the definition of probability dis- tributions for the software functions and inputs to control the selection of representative test inputs. These distribution-based methods introduce a math- ematical basis (statistics) for realizing objectivity and operational reahsm in the requirements validation process.

PERSPECTIVES ON SAMPLING STRATEGIES

The nonstatistical methods for functional testing [7-101 have been used almost exclusively and have proven very effective in the validation of software for most applications. If there are concerns with these methods, they generally revolve around a lack of rigor in prioritizing requirements and too much subjectivity in the selection of input values. Too often, the tester makes the sole decision on the ranking of software requirements and bases that decision on the complexity of the implementation, the implementer’s programming ability, perceived customer interest, or the tester’s personal experi- ence with this or similar functions. While this expe- rience-based intuition has many benefits, it forces a reliance on individual capabilities which may or may not be available in a given testing situation.

A lack of problem realism is also seen in the selection of input values, where the focus is often on ease of manipulation (e.g., changing one parameter at a time, relying on powers of 10, etc.). The selected values allow the tester to easily track intermediate results during test execution but may not represent reasonable or representative inputs and may only validate nonessential aspects of a requirement. This is another instance of where the capability of the

test personnel plays too dominant a role. Since the requirements validation is to often driven by sched- ule and staffing constraints within a project, less reliance on the experience and capability of the particular test personnel would be preferred to build confidence in the level of requirements validation.

Statistical methods [13, 141, while not widely used for software functional testing, may provide a more formal and objective basis for building the confi- dence levels needed for compIetion and risk assess- ment. Any concerns with the use of statistical meth- ods has generally stemmed from their perceived technical complexity and from an apprehension about their effectiveness in requirements coverage. The complexity concern arises from the need to define probability distributions for requirements and input values and to compute the expected results from randomly generated test executions.

Identifying probability distributions is definitely a new task for testers and something that cannot be performed without support from other engineering disciplines. Currently in many software develop ments, performance and loading analyses are per- formed with models (prototypes) built to assess the adequacy of the proposed problem solution. The inputs on required function execution, input traffic, operational scenarios, etc., that drive these analyses are exactly the probability data needed to drive the functional testing. Testers must work with systems engineers, operational analysts, and the customer to develop representative distributions that reflect the planned use for the software. This analysis requires additional test and project resources but should not introduce a new level of complexi~, since it is an extension of the analysis performed to assess the proposed solution.

The tester must define expected outputs for each test procedure in order to determine the success of the test execution. Computing these outputs may require various levels of analyses, from simple pen- cil-and-paper calculations to complicated simulation model executions. With nonstatistical methods, the tester has the liberty of selecting input values that should give known output values or are at least easy to manipulate in computing expected outputs. Since statistically generated test procedures will use input values randomly selected based on the value do- mains and probability distributions of the problem, the calculations may require more effort. Computing results for a statistically representative set of soft- ware inputs is a price that must be paid for introduc- ing realism in test procedures. The impact of these calculations will depend on the problem complexity and what is needed with expected outputs to make a

112 J. SYSTEMS SOFTWARE 1993; 20:107-114

M. Dyer

pass/fail determination (i.e., an on/off light condi- tion or calculating the coordinates of a satellite to so-decimal-point precision).

The coverage issue arises from a misconception of how statistical methods are being applied to soft- ware functional testing. It is not the case that test inputs are randomly selected based on a uniform distribution (each test input is as likely to occur as any other), which will produce haphazard, large, and costly test samples. It is the case that the statistical methods [13, 141 for software functional testing de- fine their requirements and input value selection as directly controlled by the probability distributions that reflect the best project understanding of how the software is used operationally. A problem with requirements coverage has not materialized in the current testing experience [4] with statistical methods.

A related concern with probability distributions is that requirements and input values with the higher probabilities will most likely be selected, which is true. The concern is that critical requirements and input values that are statistically insignificant may not get selected and be overlooked. It is the case that items with a low probability are less likely to be selected for test, but where these items are critical in the problem sense, special statistical handling must be defined. In the aircraft example 141, many potential conditions could be life threatening, such as certain structural damage, loss of fuel, etc. While the probability of these events occurring is minis- cule, it would be foolhardy not to test these condi- tions before delivery. A fault tree analysis could be performed to characterize all the life-threatening conditions and their likelihoods of occurring, from which a separate distribution for a subset of the requirements and input values could be built and used in a separate set of tests. A similar approach could be taken to focus on the performance drivers within the software or to focus on any other concern of interest. Driving this special focused testing with probability distributions ensures objectivity and rigor in the testing of the special interest areas.

STATISTICAL SAMPLING APPROACH

One approach to statistical sampling for software functional testing is defined as part of the Clean- room software development methodolo~ 1131. A three-step process 1) defines probabili~ distribu- tions for the software functions and inputs based on an analysis of operational requirements, 2) encodes the distributional data into a data base, and 3) generates test case samples from the data base. The last two steps imply the availability of a test case

generator tool [X51; this type of automation is needed for functional testing software of any reasonable complexity.

The first step in this statistical test process would start with the software speci~cation for a product, which identifies the requirements to be implemented and therefore to be validated in the functional test- ing. An analysis of the specification would examine the total set of requirements and all the constraints, dependencies, relationships, and interactions be- tween and among those requirements. For software products embedded in larger systems, other consid- erations, such as the different modes of the system operation, the system reconfiguration strategies for failure work-arounds, and the system philosophy on operation with partial capabilities, must also be ad- dressed in the analysis. An examination of software requirements is performed regardless of whether a statistical or nonstatistical approach to testing is adopted. However, the next part of the requirements analysis is unique to statistical approaches, where the probability distributions for each requirement and software input must also be identified.

If all functions and data are equally likely to occur, then a uniform distribution is defined for all cases. This will lead to a strictly random selection of tests, which is the case that is generally misunder- stood to represent all forms of statistical testing. When something other than a random selection of data is required for functional testing, the goals of the testing effort must be clearly defined, since this will dictate the approach to defining the probability distributions. In the Cleanroom work [13], the focus was on computing the reliability of software prod- ucts so that probability distributions were defined to reflect the operational usage of the software prod- uct. In other cases, the focus might be on the quality of the software for safety-critical applications, on the performance drivers’ impact on the software in a particular application, on the software’s handling of customer-defined critical functions, or on some other perspective. In many cases, several different focuses might need to be addressed in software functional testing, which would require the definition of dif- ferent families of probability distributions to accom- modate each focus. The information needed to de- velop probability distributions is generally not con- tained completely within the software specification but must be located in other project documents which address the software’s operating environ- ments.

The second step in this statistical test process involves the encoding of the distributional data on the software functions and input data into a form that is acceptable to a generator tool 2151. Based on

Statistical Sampling

Table 4. Functional Test Coverage of Requirements

J. SYSTEMS SOFTWARE 113 1993; 20:107-l I4

Specification Paragraph

Paragraph Content f?eq;L;r t Test Case Count

3.2.6 3.2.6.1 3.2.3.1 .l 3.2.6.1.2 3.2 6.1.3 3.2.6.1.4 3.2.6.2 3.2.6.2.1 3.2.6.2.1.1 3.2.6.2.1.2 3.2.6.2.1.3 3.2.6.2.1.4 3.2.6.2.1.5 3.2.6.2.i .6 3.2.6.2.2 3.2 6.2.3 3.2.6 2.4 3.2.6 2 4.1 3.2.6.2.4.2 3.2.6.2.5 3.2.6.2.6 3.2.6.2.7 3.2.6.2.7.1 3.2.6.2.7.2 3.2.6.2.7.3 3.2.6.2.6 3.2.6.3 3.2.6.3.1 3.2.6.3.2 3.2.6.3.3 3.2.6.3.4 3.2.6.4 3.2.6.4.1 3.2.6.4.2 3.2.6.5 3.2.6.5.1 3.2.6.5.2 3.2.6.5.3 3.2.6.5.4 3.2.6.5.5 3.2.6.6 3.2.6.6.1 3.2.6.6 1 .l 3.2.6.6.1.2 3.2 6.6.2 3.2.6 6.3 3.2.6.6.4 3.2.6.6.5 3.2.6.6.6 3.2.6.6.7 3.2.6.6.6 3.2.6.6 9 3.2 6 6 10 32.66.11 3.2.6.6.11.1 3.2.6.6.11.2 3.2.6.6.11.3 3.2.6.6.11.4 326.6.11 5 3.2.6.6.11.6 3.266 11.7 3.2.6 6 11 6 3.2.6.6.11.9

Navrgation Navigation Insertion Present Position Insertron Ground Elevation Barometric Pressure Data Override Insertion Sensor Alignment Ground Alignment Control Ground Alignment insertron

1:: ~~~~~~~“t INS Fast Alignment AHRS Normal Alignment AHRS Fast Alignment Au Alrgnment Alrgnment Status Navrgatton lnitialrzation INS Initialization AHRS lnitralization INS Sensor Mode AHRS Sensor Mode AHRS Flux Valve Flux Valve Display Flux Valve Calculation Check Flux Valve Calculation Doppler Land/Sea Mode Control Data Reasonableness Mode Availability Mode Recommendation Mode Activation Air Computatrons True Air Speed Wind Computation Navigatron Support Earth Radii Doppler/INS Doppler/AHRS Altitude Barometric Calculation Position Keeping Insertion.Mode.Update Insertion and Mode

~~~~~~~~Rates Air Ali nment and Doppler In-Fli

f t Alignment

Dopp erilNS Navigation INS Correction Inertial Navi ation Doppler/AH w S or AHRS Au Data Doppler/INS Heading Other Parameters True Heading Magnetic Variance Ground Track Angle of Attack System Rates Turn Rate Body Accelerations Frame Accelerattons Height Above Ground

4

l2l

1% 14 2

1’4 14

::

: 33 9

16 14 6

13

: 33 27 6 2 15

t 27 1

9’ 1

:

;

: 1

; 10

193

s

: 2

1

1: 1

:

s 3 1

Note ! Note 1

11 13 6

N:: 1

No: 1

;

:

f 15

Note 1 5 1 . .

Note 1

; 9

Note 1

;:

:3 Note 1

9

No?e 1 L 6

:

N:t: t Note 1

:5

145

t 15

:

P Note 1

45

Note 1 - Paragraph covered by lower

level tests

the Cleanroom experience, this typically requires the navigation through sets of lists, and the terminals definition of a list notation for describing the func- permit the formatting of test procedures in applica- tion composition and a minimal set of terminal, tion-specific terms. The goal of the generator tool is macro- and pseudocommands for facilitating test to organize sets of data sequences which could con- procedure generation. The pseudocommands allow stitute the steps in a test procedure that would operations on members of lists, the macros allow perform a functional test. The function and distribu-

114 J. SYSTEMS SOFTWARE 1993; 20:107-114

tional data are encoded to simplify and speed up this construction process.

The third step in this statistical test process is the building of actual test procedures by the generator tool. The automated approach is based on sequenc- ing through a previously created data base for a software product, creating some defined number of test procedures, and filling out the contents of each test procedure with a representative set of opera- tional data randomly selected from the data base.

APPLICATION EXPERIENCE

To illustrate the use of this statistical test process, again consider the navigation software subcompo- nent [4] discussed earlier. A separate operational analysis was performed involving test, system, opera- tional, and customer personnel to define the proba- bility distributions for the discrete functions identi- fied in Table 3. The operational environment and the probabilities for the input values were also de- fined and a data base was created from which repre- sentative tests could be selected for the functional testing of the navigation software.

In the particular software development situation, a total of 121 discrete tests were selected for the functional testing. These tests addressed all of the specification paragraphs in Table 1, all but four of the specification paragraphs in Table 2, and all but an additional six specification subparagraphs in Table 3. Table 4 provides a distribution of the tests against specification paragraphs to show that there was not a simple one-to-one mapping of tests to specification paragraphs nor to requirements within paragraphs. Tests were not generated for certain paragraphs and requirements identified with bold text in Table 4 because the associated probabilities were too small for selection.

The selected tests were reasonably robust as test- ing vehicles since, on average, only one or two tests were required to cover each specification paragraph and each test addressed close to four discrete re- quirements. Overall, the tests addressed 89.1% of the total requirements as specified in Table 3, which is a high mark for an automated approach. If this 90% level were sufficient for the functional testing of the navigation software, no further tests would be generated and run. This could be the appropriate decision depending on the criticality of the missed requirements and whether software functional test- ing in just one step in a multistep system test pro- cess. If a higher level of coverage is desired, addi- tional statistical samples or specially formulated tests must be executed to improve that coverage percent- age.

M. Dyer

The statistical test approach [13] and its generator tool fl.51 have been used to test some dozen different applications, including the functional testing of the generator tool itself. In every instance, the test per- sonnel were able to identify the requirements to be validated, the relevant software inputs with their data domains, and the probabili~ distributions for both the software functions and inputs. From this information, which was encoded for the generator’s data base, the test personnel were able to create realistic tests to drive the functional testing. Embed- ded software for avionic, surveillance, and space applications presented more of a challenge because of the functional complexities of such systems and the need to understand the constraints imposed by the application environment and the human opera- tor interactions. Statistical ideas were also success- fully applied in these application areas and provided early exposure to realistic and representative soft- ware usage as the software was being developed.

REFERENCES

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

B. Beizer, Sofhyare Testing Techniques, Van Nostrand Reinhold Co., New York, 1983. W. E. Howden, A functional approach to program testing, IEEE TSE 12 (1986). D. Bird and C. Munoz, Automatic generation of ran- dom self-checking test cases, Z&V Syst. J. 22 (1983). H. D. Mills, M. Dyer, and R. C. Linger, Cleanroom software engineering, iEEE Sojbare (September 1987). W. Hetzel, The Complete Guide to Sofnyare Testing, QED Information Sciences, Inc., Wellesley, MA. 1984. P. D. Coward, Software Testing Techniques: The Soft- ware Life Cycle, Buttenvorths, London. 1990. G. J. Myers, The Art of Sojbvare Testing, John Wiley & Sons, New York. 1979. T. J. Ostrand and M. J. Balcer, The category-partition method for functional test, CACM 31 (1988). W. R. Elmendorf, Cause-Effect Graphics in Func- tional Testing, IBM Technical Report 00.2487, 1973. D. W. Cooper, Adaptive testing; Proceedings of the Conference on Software Engineering, 1976. J. W. Duran and S. C. Ntafos, An evaluation of random testing, IEEE TSE SE-10 (1984). D. Ince and S. Hekmatpoor, An Evaluation of Some Black Box Testing Methods, UK Open University Technical Report 84/7,1984. M. Dyer, The ~leanroom Approach to ~uali~ Software ~eue~opment, John Wiley & Sons, Inc. New York 1992. W. K. Ehrlich, T. J. Emerson and J. Musa, Effect of test strategy on software reliability measurement, Proc. Intl Research Conf on Reliability, 1988. J. J. Gerber, Cleanroom test case generator, IBM Technical Report 86.0008, 1986.