test case-aware combinatorial interaction testing

Test Case-Aware CombinatorialInteraction Testing

Cemal Yilmaz

Abstract—The configuration spaces of modern software systems are too large to test exhaustively. Combinatorial interaction testing

(CIT) approaches, such as covering arrays, systematically sample the configuration space and test only the selected configurations by

using a battery of test cases. Traditional covering arrays, while taking system-wide interoption constraints into account, do not provide

a systematic way of handling test case-specific interoption constraints. The basic justification for t-way covering arrays is that they can

cost effectively exercise all system behaviors caused by the settings of t or fewer options. In this paper, we hypothesize, however, that

in the presence of test case-specific interoption constraints, many such behaviors may not be tested due to masking effects caused by

the overlooked test case-specific constraints. For example, if a test case refuses to run in a configuration due to an unsatisfied test

case-specific constraint, none of the valid option setting combinations appearing in the configuration will be tested by that test case. To

account for test case-specific constraints, we introduce a new combinatorial object, called a test case-aware covering array. A t-way

test case-aware covering array is not just a set of configurations, as is the case in traditional covering arrays, but a set of

configurations, each of which is associated with a set of test cases such that all test case-specific constraints are satisfied and that, for

each test case, each valid combination of option settings for every combination of t options appears at least once in the set of

configurations that the test case is associated with. We furthermore present three algorithms to compute test case-aware covering

arrays. Two of the algorithms aim to minimize the number of configurations required (one is fast, but produces larger arrays, the other

is slower, but produces smaller arrays), whereas the remaining algorithm aims to minimize the number of test runs required. The

results of our empirical studies conducted on two widely used highly configurable software systems suggest that test case-specific

constraints do exist in practice, that traditional covering arrays suffer from masking effects caused by ignorance of such constraints,

and that test case-aware covering arrays are better than other approaches in handling test case-specific constraints, thus avoiding

masking effects.

Index Terms—Software quality assurance, combinatorial interaction testing, covering arrays

Ç

1 INTRODUCTION

GENERAL-PURPOSE, one-size-fits-all software solutions arenot acceptable in many application domains. For

example, web servers (e.g., Apache), databases (e.g.,MySQL), and application servers (e.g., Tomcat) are requiredto be customizable to adapt to particular runtime contextsand application scenarios. One way to support softwarecustomization is to provide configuration options throughwhich the behavior of the system can be controlled.

While having a configurable system promotes customi-zation, it creates many system configurations, each of whichmay need extensive QA to validate. Since the number ofconfigurations grows exponentially with the number ofconfiguration options, exhaustive testing of all configura-tions, if feasible at all, does not scale well.

Combinatorial interaction testing (CIT) approachessystematically sample the configuration space and testonly the selected configurations [3], [9], [14], [22], [34].These approaches take as input a configuration spacemodel. The model includes a set of configuration options,

their possible settings, and a set of system-wide interoptionconstraints that explicitly or implicitly invalidate someconfigurations, i.e., not all configurations may be valid.Given a configuration space model, CIT approachescompute a small set of valid configurations, called at-way covering array, in which each valid combination ofoption settings for every combination of t options appearsat least once [9].

Once a covering array is generated, the system is thentested by running its test cases in all the selectedconfigurations. By doing so, traditional covering arraysassume that all test cases can run in all the selectedconfigurations. In this paper, we argue that the test cases ofconfigurable systems are likely to have assumptions aboutthe underlying configurations. That is, not all test cases mayrun in all configurations even if those configurations satisfythe system-wide constraints. For example, in a studyconducted on Apache, a highly configurable HTTP server,and MySQL, a highly configurable database managementsystem, we observed that 378 out of 3,789 Apache test casesand 337 out of 738 MySQL test cases had test-case specificinteroption constraints. While the system-wide constraintsdetermined the set of valid ways the system under testcould be configured, a test case-specific constraint deter-mined the set of configurations in which the respective testcase could run. These test case-specific constraints wereencoded in the test oracles of our subject applications,indicating that they were known to the developers. When a

684 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 39, NO. 5, MAY 2013

. The author is with the Faculty of Engineering and Natural Sciences,Sabanci University, Tuzla, Istanbul 34956, Turkey.E-mail: [email protected].

Manuscript received 26 Mar. 2012; revised 14 July 2012; accepted 13 Sept.2012; published online 21 Sept. 2012.Recommended for acceptance by J. Grundy.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TSE-2012-03-0071.Digital Object Identifier no. 10.1109/TSE.2012.65.

0098-5589/13/$31.00 � 2013 IEEE Published by the IEEE Computer Society

test case-specific constraint of a test case was not satisfiedby a configuration, the test case simply skipped theconfiguration, i.e., the test case refused to run in theconfiguration. For instance, a number of Apache test caseswere specifically designed to test the distributed authoringand versioning (DAV) feature of the Apache server—afeature that needs to be explicitly configured into thesystem. Therefore, these test cases assumed that the serverwas already configured with the DAV feature. If not, theysimply refused to run. Other test cases in the test suite wereunaffected by this aspect of server configuration and ranregardless of the setting of this configuration option.

It is important to note that running configuration-dependent test cases in an ad hoc manner only inconfigurations supporting the required features, such asDAV, is not necessarily sufficient as the interactionsbetween the required features and the rest of the configur-able features may still need to be tested. Therefore, we stillneed a systematic way of testing the interactions, such asthe one provided by covering arrays.

On the other hand, traditional covering arrays, whilehandling system-wide constraints, do not provide asystematic way of handling test case-specific constraints.Assuming the existence of a well-constructed test suite, thebasic justification for traditional t-way covering arrays isthat they can cost effectively exercise all system behaviorscaused by the settings of t or fewer options. We hypothe-size, however, that, in the presence of test case-specificconstraints, many such behaviors are not actually testedbecause of masking effects [15] caused by the overlookedtest case-specific constraints. For instance, when a test caseskips a configuration due to an unsatisfied test case-specificconstraint, none of the option setting combinations appear-ing in the configuration will be tested by that test case.

Fig. 1a illustrates masking effects in a hypotheticalscenario. In this scenario, the system under test has fourconfiguration options (o1, o2, o3, and o4Þ, each of whichtakes a Boolean value (0 or 1). The test suite contains threetest cases (t1, t2, and t3). There are no system-wideconstraints. However, test cases t1 and t2 have some testcase-specific constraints: t1 can run only in configurationsin which o1 ¼ 0, and t2 can run only in configurations inwhich o1 ¼ 1. Test case t3, on the other hand, has no testcase-specific constraints. In our hypothetical scenario, bymimicking the way traditional covering arrays are used inpractice, a 3-way covering array is created and then all thetest cases are executed in all the selected configurations

(Fig. 1a). Literals P and S indicate a test success and a testskip, respectively.

There are 20 valid 3-way combinations of option settingsto be tested by each of t1 and t2, and 32 valid combinationsfor t3. Consider the test case t1. Since t1 skipped the firstfour configurations, the 3-way option setting combinationsfor options o2, o3, and o4 that appear in the first fourconfigurations were actually not tested by t1. As these fourcombinations appear nowhere else in the covering array, t1never had a chance to test them. Similarly, t2 never had achance to test the four valid 3-way combinations, whichhappened to appear only in the last four configurationsskipped by t2. As a result, 8 out of 72 (11 percent) valid3-way option setting combination-test case pairs were nottested at all, i.e., masked, due to the test skips caused by theoverlooked test case-specific constraints.

In this paper, to account for test case-specific constraintsand avoid the harmful consequences of masking effectscaused by them, we introduce a new combinatorial object,called a test case-aware covering array. Test case-awarecovering arrays take as input a traditional configurationspace model augmented with a set of test cases, each ofwhich can have a test case-specific constraint. Given aconfiguration space model, a t-way test case-aware coveringarray is not just a set of configurations, as is the case intraditional covering arrays, but a set of configurations, eachof which is associated with a set of test cases, indicating thetest cases scheduled to be executed in the configuration.Fig. 1b, as an example, presents a 3-way test case-awarecovering array constructed for our hypothetical scenario. At-way test case-aware covering array has the followingproperties: 1) For each test case, every valid t-waycombination of option settings occurs at least once in theset of configurations in which the test case is scheduled tobe executed and 2) no test case is scheduled to be executedin a configuration which violates the test case-specificconstraints of the test case or the system-wide constraints.

We furthermore present three algorithms to compute testcase-aware covering arrays. Two of the algorithms aim tominimize the number of configurations required (one is fast,but produces larger arrays, the other is slower, but producessmaller arrays), whereas the remaining algorithm aims tominimize the number of test runs required. The results of ourempirical studies conducted on two widely used highlyconfigurable software systems, namely, Apache and MySQL,strongly suggest that 1) test case-specific interoptionconstraints do exist in practice, 2) traditional covering arrayssuffer from masking effects caused by the ignorance of suchconstraints, and 3) test case-aware covering arrays are better

YILMAZ: TEST CASE-AWARE COMBINATORIAL INTERACTION TESTING 685

Fig. 1. (a) A traditional 3-way covering array versus (b) a 3-way test case-aware covering array.

than other approaches in handling test case-specific con-straints, thus avoiding masking effects.

The remainder of this paper is organized as follows.Section 2 provides background information, Section 3introduces test case-aware covering arrays, Section 4presents three algorithms to compute them, Section 5describes the empirical studies, Section 6 discusses thepotential external threats to validity, Section 7 discusses therelated work, and Section 8 presents concluding remarksand possible directions for future work.

2 BACKGROUND INFORMATION

In this section, we provide background information ontraditional covering arrays and masking effects.

2.1 Traditional Covering Arrays

CIT approaches take as input a configuration space modelM ¼ <O; V ;Q>. The model includes a set of configurationoptions O ¼ fo1; o2; . . . ; ong, their possible settings V ¼fV1; V2; . . . ; Vng, and a system-wide interoption constraintQ (if any). In effect, the configuration space model implicitlydefines a valid configuration space for testing.

Each option oi ð1 � i � nÞ in the configuration spacemodel takes a value from a finite set of ki distinct valuesVi ¼ fv1; v2; . . . ; vkig (ki ¼ jVij). Let sij be an option-valuetuple of the form <oi; vj>, indicating that option oi is set tovalue vj 2 Vi. Furthermore, let Si be the set of all possibleoption-value tuples for option oi, i.e., Si ¼ f<oi; vj>:vj 2 Vig.Definition 1. A t-tuple �t ¼ fsi1j1 ; si2j2 ; . . . ; sitjtg is a set of

option-value tuples for a combination of t distinct options,such that 1 � t � n, 1 � i1 < i2 < � � � < it � n, and sipjp 2Sip for p ¼ 1; 2; . . . ; t.

Let �t be the set of all t-tuples for some 1 � t � n. Not allthe t-tuples in �t may be valid due to the system-wideconstraint Q. Assume a deterministic function validð�t;QÞsuch that validð�t;QÞ is true if and only if �t is a validt-tuple under constraint Q. Otherwise, validð�t;QÞ is false.The set of all valid t-tuples �t under constraint Q is thendefined as: �t ¼ f�t : �t 2 �t ^ validð�t;QÞg.Definition 2. Given a configuration space model M ¼<O; V ;Q>, a valid configuration c is a valid n-tuple,i.e., c 2 �n, where n ¼ jOj.

Note that, in a valid configuration, each option definedin the configuration space model takes a valid value and theconfiguration (i.e., n-tuple) does not violate Q.

Definition 3. Given a configuration space model M ¼<O; V ;Q>, the valid configuration space C is the set of allvalid configurations, i.e., C ¼ fc : c 2 �ng.

CIT approaches systematically sample the valid config-uration space and test only the selected configurations. Thesampling is carried out by computing a t-way coveringarray [9], where t is often referred to as the strength of thecovering array.

Definition 4. A t-way covering array CAðt;M ¼ <O; V ;Q>Þis a set of valid configurations in which each valid t-tuple

appears at least once, i.e., CAðt;M ¼ <O; V ;Q>Þ ¼fc1; c2; . . . ; cNg such that 8 �t 2 �t 9 ci � �t, where ci 2 Cfor i ¼ 1; 2; . . . ; N .

Once a covering array is computed, the system under testis validated by running its test suite in all the selectedconfigurations. Since the amount of resources required fortesting is a function of the covering array size (i.e., N),covering arrays are constructed so that all valid t-tuples arecovered in a “minimum” number of configurations.

Furthermore, it is also of practical interest to guaranteethe inclusion of certain configurations in covering arrays fortesting. A widely used mechanism for this purpose is theseeding mechanism [9], [13], [18]. Given a seed, which istypically a set of configurations, the covering array isconstructed around the seed. Conceptually, all the t-tuplesincluded in the seed are considered to be already coveredand a new set of configurations are generated only to coverthe rest of the valid t-tuples.

2.2 Masking Effects

We first introduced the concept of masking effects in one ofour earlier works [15].

Definition 5. A masking effect is an effect that prevents a testcase from testing all valid combinations of option settingsappearing in a configuration which the test case is normallyexpected to test.

A harmful consequence of masking effects is that theycause developers to develop false confidence in their testprocesses, believing them to have tested certain optionsetting combinations when they in fact have not. Onesimple example of a masking effect (besides the ones causedby overlooked test case-specific constraints) would be anerror that crashes a program early in the program’sexecution. The crash then prevents some configurationdependent behaviors that would normally occur later in theprogram’s execution from being exercised. Unless thecombinations controlling those behaviors are tested in adifferent configuration or unless the error is fixed and thefaulty configuration is retested, we cannot conclude thatthose configuration dependent behaviors have been tested.

Masking effects can be caused by many factors. Systemfailures, unaccounted control dependences among config-uration options (i.e., option setting combinations thateffectively cancel other option setting combinations), andincomplete or incorrect interoption constraints can allperturb program executions in ways that prevent otherconfiguration dependent behaviors from being tested.

In our previous work [15], we focused on masking effectscaused by system failures and developed a feedback drivenadaptive combinatorial testing approach to reduce theirharmful consequences. At each iteration of the aforemen-tioned approach, we detect potential masking effects, isolatetheir likely causes, and then schedule the t-tuples that arebeing masked for testing in the subsequent iteration. Theiterations end when, for each test case, all valid t-tupleshave been “tested.”

This work differs from our previous work in that we inthis work address masking effects caused by overlookedtest case-specific constraints rather than the ones caused bysystem failures, and that, as these constraints are known a


priori, we compute static interaction test suites rather thanthe dynamic ones produced by the previous approach.

3 TEST CASE-AWARE COVERING ARRAYS

In this paper, we consider an interoption constraint to be aconstraint among option settings which explicitly orimplicitly invalidates some combinations of option settings.System-wide interoption constraints apply to all test casesand define the set of valid ways the system under test canbe configured. A test case-specific constraint, on the otherhand, applies only to the test case that it is associated withand determines the configurations in which the test casecan run.

It is important to note that expressing test case-specificconstraints as system-wide constraints and then generatingtraditional covering arrays is not an adequate solution forhandling test case-specific constraints. One reason is thatconstraints for different test cases may conflict with eachother, in which case no solution will be found. For example,in our hypothetical scenario discussed in Section 1, theconstraints of t1 and t2 conflict; t1 cannot run whenthe binary option o1 has one setting and t2 cannot runwhen the same option has the other setting. Globallyenforcing these conflicting constraints will not generate anycovering arrays. Another reason is that, even if the test case-specific constraints do not conflict, enforcing them on alltest cases can prevent the test cases from exercising somevalid option setting combinations that are invalidated byother test cases. For example, enforcing the constraint of t1on t3 prevents t3 from testing any combinations with o1 ¼ 1which are valid for t3.

In this paper, to account for test case specific-constraints,we introduce test case-aware covering arrays. As is the casewith traditional covering arrays, test case-aware coveringarrays take as input a configuration space model M. Themodel contains a set of configuration options O ¼ fo1; . . . ;ong, their settings V ¼ fV1; . . . ; Vng, and a system-wideinteroption constraint Qs. Unlike traditional coveringarrays, the configuration space model of test case-awarecovering arrays additionally includes a set of test casesT ¼ f�1; �2; . . .g. Each test case � 2 T, in addition toimplicitly inheriting the system-wide constraint Qs, canhave a test case-specific constraint Q� . In the remainder ofthe paper, the collection of all test case-specific constraints isreferred to as QT.

In the presence of test case-specific constraints, we definethe set of valid t-tuples on a per-test case basis since a validt-tuple for a test case may be invalid for another test case.Let ��

t ¼ f�t : �t 2 �t ^ validð�t;Qs ^Q�Þg be the set of allvalid t-tuples for a test case � .

Definition 6. A valid configuration c� for a test case � 2 T is avalid n-tuple for � , i.e., c� 2 ��

n, where n ¼ jOj.Definition 7. The valid configuration space C� for a test case� 2 T is the set of all valid configurations for � ,i.e., C� ¼ fc : c 2 ��

ng.

Test case-aware covering arrays aim to ensure that eachtest case has a fair chance to test all of its valid t-tuples. Tothis end, each test case is scheduled to be executed only in

configurations which are valid for the test case so that nomasking effects can occur.

Definition 8. A t-pair is a pair of the form �t ¼ <�t; �> suchthat �t 2 ��

t and � 2 T.

Definition 9. A t-way test case-aware covering array

TCAðt;M ¼ <O; V ;T; Qs;QT>Þ¼ f<c1;T1>; . . . ; <cN;TN>g

is a set of rows of the form <ci;Ti>, where ci 2 C and Ti � Tfor i ¼ 1; 2; . . . ; N such that each valid t-pair appears at leastonce, i.e., 8 � 2 T ^ �t 2 ��

t 9 <ci;Ti> : �t � ci ^ � 2 Ti

and � 2 Ti ! ci 2 C� .

In other words, for a given configuration space model, at-way test case-aware covering array is a set of configura-tions, each of which is associated with a set of test cases,indicating the test cases scheduled to be executed in theconfiguration such that 1) none of the selected configura-tions violate the system-wide constraint, 2) no test case isscheduled to be executed in a configuration that violates thetest case-specific constraint of the test case, and 3) for eachtest case, every valid t-tuple appears at least once in the setof configurations in which the test case is scheduled to beexecuted. Fig. 1b, as an example, presents a 3-way test case-aware covering array created for our hypothetical scenariodepicted in Fig. 1a. Since none of the test case-specificconstraints are violated in this covering array, each test casehas a chance to test all of its valid 3-tuples; no maskingeffects caused by test skips can occur.

Compared to traditional t-way covering arrays, handlingtest case-specific constraints is likely to increase the numberof configurations required as the t-tuples being masked intraditional arrays may need to be covered in additionalconfigurations. However, this does not necessarily imply anincrease in the number of test runs required as the test casesare executed only in configurations that contribute to thecoverage. For example, comparing the 3-way test case-aware covering array in Fig. 1b to the traditional 3-waycovering array in Fig. 1a, we observe that, while the numberof configurations doubles, the number of test runs stays thesame as each array requires a total of 24 test runs. In thispaper, an execution of a test case in a configuration isconsidered to be a test run.

Therefore, when the goal is to test all valid t-pairs, thentraditional t-way covering arrays will not guarantee thecoverage in the presence of test case-specific constraints,whereas t-way test case-aware covering arrays, whileguaranteeing full coverage, will do so at the possible costof an increased number of configurations, but not necessa-rily an increased number of test runs. For example, the3-way test case-aware covering array in Fig. 1b is optimalfor our hypothetical scenario; no other arrays requiringfewer number of configurations or fewer number of testruns exist. Therefore, testing all three pairs in this scenariois not possible without (at least) doubling the number ofconfigurations. Thus, developers must decide betweenincreasing the number of configurations to remove maskingeffects or accepting their harmful consequences. On theother hand, if the cost of configuring the system is


negligible, then the 3-way test case-aware covering array inFig. 1b will remove all masking effects at no additional costcompared to the traditional 3-way covering array in Fig. 1a;both arrays require the same number of test runs.

4 COMPUTING TEST CASE-AWARE COVERING

ARRAYS

In this section, we present three algorithms to compute testcase-aware covering arrays. A valuable observation wemake is that there is often a tradeoff between minimizingthe number of configurations and minimizing the numberof test runs in test case-aware covering arrays. An attemptto reduce one count often increases the other count. Thistradeoff plays an important role in minimizing the total costof testing, especially when there is a profound practicaldifference between the cost of configuring the system andthe cost of running the test cases.

The first two algorithms presented in this section,namely, Algorithms 1 and 2, assume a cost model in whichthe cost of running the test cases is negligible compared tothat of configuring the system and the configuration cost isthe same for all configurations. Thus, Algorithms 1 and 2aim to minimize the number of configurations required. Onthe other hand, the remaining algorithm, namely, Algo-rithm 3, assumes an opposite cost model in which the cost ofconfiguring the system is negligible compared to that ofrunning the test cases and the execution cost is the same forall test runs. Thus, Algorithm 3 aims to minimize thenumber of test runs required.

Algorithm 1. Computing a t-way test case-aware covering

array �Mt by maintaining a separate configuration space

submodel M� for each test case � , where S� and �M�t are

the seed and the traditional t-way covering array created

for � , respectively.

Input M ¼ <O; V ;T; Qs;QT>: Config. space model

Input t: Covering array strength

1: �Mt ;

2: for each test case � in T do

3: S� fc : c 2 �Mt ^ c 2 C�g

4: �M�t

Qðt;M� ; S�Þ

5: �Mt �M

t � �M�t

6: end for

7: return �Mt

4.1 Algorithm 1: Maintaining a SeparateConfiguration Space Model for Each Test Case

A large number of algorithms have been proposed forconstructing traditional covering arrays [26]. In our firstalgorithm, whose roots stem from one of our earlier works[15], we use traditional covering array construction as acomputational primitive to generate test case-aware cover-ing arrays. At a high level, we generate a separatetraditional covering array for each test case, and, whiledoing so, we force the test cases to share configurations. Thegenerated traditional covering arrays are then merged toconstruct the test case-aware covering array.

In particular, we assume a generatorQðt;M; SÞ that

constructs a traditional t-way covering array around theseed S for the configuration space model M. Such coveringarray generators are commonplace [1].

Given a system-wide configuration space model M ¼<O; V ;T; Qs;QT> for each test case � 2 T, we first create aseparate configuration space submodel M� . The submodelM� , in addition to inheriting the configuration options O,their settings V , and the system-wide constraint Qs from M,includes the test case-specific constraint of � as a system-wide constraint. Since the submodels are maintained on aper-test case basis, conflicting test case-specific constraintsdo not pose any issues.

Once the configuration space submodels are created, wefeed these models to Algorithm 1. The t-way test case-awarecovering array �M

t to be computed is initially empty (line 1).For each test case � , we first compute a seed S� (line 3). Theseed, out of all the configurations which have been so farincluded in the test case-aware covering array, contains onlythose configurations that do not violate the test case-specificconstraints of � . We then feed the seed S� and theconfiguration space submodel M� to the generator

Q. The

output is a traditional t-way covering array �M�t that covers

all the valid t-tuples for the test case � (line 4). The test case �is then scheduled to be executed in all the selectedconfigurations in �M�

t , assuming that noncontributingconfigurations that might be present in the seed S� areeliminated by the generator.

We use the seeding mechanism to reduce the number ofconfigurations required. Since

Qmarks all the t-tuples

included in the seed as already covered and generates newconfigurations only to cover the rest of the valid t-tuples,having the seed forces the test cases to share configurations,i.e., test cases are scheduled to be executed in the sameconfigurations as much as possible, reducing the totalnumber of configurations required.

Once the traditional covering array for the test case � iscreated, we merge it with the current test case-awarecovering array �M

t (line 5). The merge operation isperformed in such a way that test cases that are scheduledto be executed in the same configuration are collected in aset and then the newly constructed set is associated with theconfiguration. Finally, after processing all the test cases, weoutput the generated test case-aware covering array (line 7).

Fig. 2 illustrates Algorithm 1 in an example. In theexample, we have a configuration space model that containsfour binary options, namely, o1, o2, o3, and o4. The system isto be tested with three test cases: t1, t2, and t3. Test cases t1and t2 do not have any test case-specific constraints,whereas t3 skips all configurations in which o1 ¼ 0 ^ o4 ¼0 or o2 ¼ 0 ^ o3 ¼ 0. There are no system-wide constraints.Furthermore, the system is required to be tested with a2-way test case-aware covering array.

To generate the test case-aware covering array, first aseparate configuration space submodel is created for eachtest case (Figs. 2a, 2b, and 2c). Then, a traditional 2-waycovering array is created for each test case in the arbitraryorder of t1, t2, and then t3. For t1, as no configurations havebeen scheduled for testing yet, an empty seed is used and atraditional 2-way covering array of size 5 is created(Fig. 2a). The newly generated covering array is then usedas the seed for t2. Since both t1 and t2 require coveringexactly the same set of 2-tuples, no additional configura-tions are needed for t2 (Fig. 2b). In the figure, configura-tions that are reused from the previous iterations aremarked with the character “*”. Next, out of the five


configurations that have been scheduled so far for testing,

the first three configurations which do not violate the test

case-specific constraint of t3 are used as the seed for t3.

Fig. 2c presents the traditional 2-way covering array

computed for t3; all of the three configurations in the seed

happen to be reused by t3. Finally, the traditional covering

arrays are merged to generate the 2-way test case-aware

covering array given in Fig. 2d. The resulting array requires

eight configurations and 16 test runs.

Algorithm 2. Computing a t-way test case-aware covering

array �Mt by maintaining a single configuration space

model M for the software under test, where �Tt is the set of

all t-pairs yet to be covered, ��t is the set of t-pairs yet to be

covered for test case � , and <c; T 0> is the best row picked

at the current iteration.



1: �Mt ;

2: �Tt

S�2T �

�t

3: while �Tt 6¼ ; do

4: <c; T 0> bestRowð�Tt Þ

5: �Tt �T

t � �<c;T0>t

6: �Mt �M

t [<c; T 0>7: end while

8: return �Mt

4.2 Algorithm 2: Maintaining a Single ConfigurationSpace Model

An attractive side of Algorithm 1 is that it can readily andquickly be implemented by using the existing covering arraygenerators that support system-wide constraints and aseeding mechanism. One downside of the approach, though,is that the optimization is carried out on a per-test case basis.While the problem is being solved for a test case, thecoverage requirements of the remaining test cases waiting tobe processed are not taken into account. This leads to a lossof opportunity for further reducing the number of config-urations. For example, changing the processing order of thetest cases in Fig. 2 can produce a smaller array.

In this section, to alleviate the shortcomings of Algo-rithm 1, we present a greedy algorithm that keeps a globalview of all valid t-pairs to be covered. To this end, wemaintain a single configuration space model for the systemunder test rather than a separate model for each test case.

Our algorithm operates in an iterative manner. At eachiteration, we select the best configuration and the set ofassociated test cases which cover the greatest number ofpreviously uncovered t-pairs. The selection is then includedin the test case-aware covering array and the t-pairsappearing in the selection are marked as covered. Theiterations end when every valid t-pair is covered at least once.

Algorithm 2 presents the main loop of the proposedapproach. We maintain a pool �T

t that keeps track of the


Fig. 2. Illustrating Algorithm 1 in an example.

valid t-pairs yet to be covered. The pool initially contains allvalid t-pairs (line 2), where ��t refers to the set of valid t-pairsfor the test case � . As t-pairs are covered, they are removed

from the pool. At each iteration, we pick the best row<c; T 0> such that the configuration c is a valid configurationfor all the test cases in T0 � T, and that there is no other row

covering more previously uncovered t-pairs (line 4). Theselected row is then included in the test case-aware coveringarray �M

t (line 6) and the newly covered t-pairs �<c;T0>

t in therow are removed from the pool (line 5). When the pool is

empty, the iterations terminate (line 3) and the computedtest case-aware covering array is returned (line 8).

An integral part of this approach is choosing the best row

at each iteration (line 4). In this paper, as a proof of concept,we implement this functionality by using answer setprogramming (ASP).

ASP is a declarative programming paradigm which

represents a computational problem as a “program” whosemodels, called “answer set,” correspond to the solutions[24], [27]. ASP solvers are then used to find the answer sets

for the program.Fig. 3 provides an example ASP encoding that picks the

best rows in the computation of a 2-way test case-awarecovering array for a hypothetical configuration spacemodel. The model contains four binary options,(o1,o2,o3,o4), and three test case, (t1,t2,t3). Below,we explain the encoding in a nutshell with no intention ofintroducing ASP. For more details about ASP, the interestedreader may refer to an introduction [16] or a book [2].

In a configuration, every configuration option must haveexactly one valid setting:

1 {cfg(Opt,Val) : val(Opt,Val)} 1 :- opt(Opt).

A configuration is a valid configuration for a test case T,if it is not an invalid configuration for the test case:

validCfg(T) :- test(T), not invalidCfg(T).

As an example, the encoding provides two test case-specific constraints. Configurations in which o1 ¼ 1 areinvalid configurations for the test case t1, and the ones inwhich o1 ¼ 0 are invalid for the test case t2:

invalidCfg(t1) :- cfg(o1, 1).

invalidCfg(t2) :- cfg(o1, 0).

ASP is expressive enough to encode any type ofconstraint. The constraints in the example encoding are,however, kept simple for the sake of the discussion.

The set of valid t-pairs yet to be covered, i.e., �Tt in

Algorithm 2, is expressed as a set of tpair facts. Forexample, the following fact in the encoding indicates thatthe 2-pair <<o1 ¼ 0; o2 ¼ 1>; t1> is a valid 2-pair yet to becovered for the test case t1:

tpair(o1, 0, o2, 1, t1).

A 2-pair <<O1=V1,O2=V2>,T>, where O1, O2, V1, V2,and T are variables that will be grounded by the ASP solver,is considered to be covered when 1) the 2-tuple<O1=V1,O2=V2> appears in a configuration, 2) the config-uration is a valid configuration for the test case T and T isscheduled to be executed in the configuration, and 3) the 2-tuple has not yet been covered for T:

covered(O1,V1,O2,V2,T) :- cfg(O1,V1),

cfg(O2,V2),

validCfg(T),

tpair(O1,V1,O2,V2,T).

Finally, the following ASP directive ensures that the bestrow that covers the greatest number of previously un-covered 2-pairs is chosen:

#maximize {covered(O1,V1,O2,V2,T)}.

Adapting the ASP encoding given in Fig. 3 to computetest case-aware covering arrays of arbitrary strength for anygiven configuration space model is straightforward. Toconduct the experiments discussed in Section 5, forexample, we developed a simple script which automaticallygenerated the ASP encoding for a given configuration spacemodel and a value of t.

Fig. 4 illustrates Algorithm 2 in our running exampleintroduced in Section 4.1. Unlike Algorithm 1, Algorithm 2maintains a single configuration space model for t1, t2, andt3. Furthermore, Algorithm 2 keeps a global view of all the2-pairs to be covered and proceeds by selecting the best rowsuch that the number of newly covered 2-pairs is max-imized at each iteration. The 2-way test case-aware coveringarray generated by Algorithm 2 as well as the number ofnewly covered 2-pairs by each row included in the arraycan be found in the figure. The first row of the array coversthe maximum number of 2-pairs that can be covered by arow (i.e., 4

2

� �� 3 ¼ 18). As new configurations are included

in the array, since the number of remaining 2-pairs to becovered decreases, the number of newly covered two-pairmonotonically decreases. In the end, the computed 2-way


Fig. 3. An example ASP encoding for computing the “best” row at agiven iteration.

test case-aware covering array requires six configurationsand 17 test runs.

Comparing the test case-aware covering arrays createdby Algorithms 1 and 2, we observe that Algorithm 2increased the number of configurations shared by the testcases. For example, in the test case-aware covering arraycreated by Algorithm 2 (Fig. 4), five configurations wereshared by all three of the test cases, whereas in the test case-aware covering array created by Algorithm 1 (Fig. 2), onlythree configurations were shared by all the test cases. This,in turn, reduced the total number of configurations requiredfrom 8 to 6 (i.e., by 25 percent).

Algorithm 3. Minimizing the number of test runs

required, where �Mt is the t-way test case-aware covering

array to be computed for the configuration space model

M, and S� and �M�t are the seed and the traditional t-way

covering array created for test case � , respectively.



1: �Mt ;

2: for each test case � in T do

3: S� ;4: �M�

t Qðt;M� ; S� Þ

5: �Mt �M

t � �M�t

6: end for

7: return �Mt

4.3 Algorithm 3: Minimizing Number of Test Runs

Algorithms 1 and 2 aim to minimize the number ofconfigurations included in test case-aware coveringarrays. To do that, both algorithms force the test casesto share configurations. However, reducing the number ofconfigurations does not necessarily reduce the number oftest runs required. In fact, there is often a tradeoff betweenminimizing the number of configurations and minimizing

the number of test runs. The reason behind this tradeoff is asimple one. Forcing test cases to share configurations, thusreducing the number of configurations, can cause a test caseto execute in a number of configurations to cover a certainset of t-tuples which could have been otherwise covered bya smaller number of configurations, thus reducing thenumber of test runs if the test case was not forced to sharethe configurations. On the other hand, if configurations arenot shared among the test cases, the number of configura-tions required will naturally increase.

For instance, the brute-force search of the configurationspace used in our running example depicted in Figs. 2 and 4proved that a minimum of six configurations is needed tosatisfy the coverage requirements of all the test cases. Thebrute-force search also revealed that when the number ofconfigurations is fixed at 6 (i.e., at the minimum), aminimum of 17 test runs is required. From this perspective,the 2-way test case-aware covering array computed byAlgorithm 2 is optimal. On the other hand, we know that aminimum of five configurations is needed for each test caseto satisfy the coverage requirement of the test case. Thisindicates that the minimum number of total test runsneeded is 15 (3 � 5 ¼ 15). However, both Algorithms 1 and 2required more test runs; Algorithm 1 required 16 test runs,whereas Algorithm 2 required 17 test runs.

The algorithm presented in this section, namely, Algo-rithm 3, aims to minimize the number of test runs. To thisend, we slightly modify Algorithm 1 such that, instead ofcreating a seed for each test case, thus forcing the test casesto share configurations, we use an empty seed (line 3). Thisprovides each test case with the freedom to select its ownconfigurations (line 4). The rest of the algorithm is the sameas Algorithm 1.

At each iteration of the algorithm, since the traditionalcovering array generator

Q“minimizes” the number of

configurations required for a test case, the number of testruns required for each test case would be minimized, and sowould the total number of test runs.

Fig. 5 illustrates Algorithm 3 in our running example. Asis the case with Algorithm 1, a separate configuration spacesubmodel is maintained for each test case, a traditional2-way covering array is computed for each submodel, andthe newly computed covering arrays are merged toconstruct the 2-way test case-aware covering array. How-ever, unlike Algorithm 1, the seeds are always empty. AsFigs. 5a, 5b, and 5c indicate, the size of the traditionalcovering array created for each test case is 5 (i.e., at theminimum). Although no attempt is made to share config-urations, three configurations happen to be shared. In theend, the computed test case-aware covering array (Fig. 5d)requires 12 configurations and 15 test runs.

Note that, had the test cases t1 and t2 shared exactly thesame set of configurations, the number of configurationswould have been reduced to 8 without changing thenumber of test runs required. However, we opted not toshare any configurations between t1 and t2 to demonstratepotential scenarios that could arise when nondeterministictraditional covering array generators are used.

In any case, brute-force search proved that when thenumber of test runs is fixed at 15 (i.e., at the minimum), a



minimum of seven configurations is required. That is, for

our running example, minimizing the number of config-

urations requires a minimum of six configurations and

17 test runs, whereas minimizing the number of test runs

requires a minimum of seven configurations and 15 test

runs, demonstrating the potential tradeoff between the two

optimization criteria.

5 EXPERIMENTS

We conducted a series of empirical studies to 1) gain more

insight on test case-specific constraints occurring in

practice, 2) quantify the extent to which traditional covering

arrays suffer from ignoring such constraints, and 3) evaluate

the efficiency and effectiveness of the test case-aware

covering arrays.In these studies, we used two widely used highly

configurable software systems as our subject applications:

Apache v2.3.11-beta and MySQL v5.1. Apache is an HTTP

server. MySQL is a database management system.We chose these systems for several reasons. First, they

share the key characteristics common to configurable

software systems. They are highly configurable with dozens

of configuration options supporting a wide variety of

features. They have a large code base and substantial test

code. Both systems enjoy a large developer community that

actively updates and tests the systems. Second, like many

configurable software systems, developers of these systems

cannot exhaustively test the entire configuration space; the

number of possible configurations is far beyond the

resources available to run the test cases in a timely manner,

e.g., for regression testing.

All the experiments were performed on an AMD 64Athlon machine with 4 GB of RAM, running Ubuntu 10.10operating system.

5.1 Study 1: Test Case-Specific Constraints inPractice

In the first study, our goal was to gain more insight on testcase-specific constraints occurring in practice.

5.1.1 Study Setup

To conduct the study, we studied the test cases that camewith the source code distribution of our subject applica-tions. Each test case in both subject applications had its owntest oracle, which determines whether each test caseexecution “passed,” “failed,” or was “skipped.” Successfultest cases simply emit pass. Failed test cases emit fail. A testcase returns skipped when it determines that it cannot runin a given configuration, indicating that the test case has atest case-specific constraint.

5.1.2 Data and Analysis

We first observed that the test case-specific constraints wereencoded in the test oracles of our subject applications,indicating that they were known to the developers. This isimportant because the more accurate and complete the testcase-specific constraints are, the better the test case-awarecovering arrays perform in avoiding masking effects.

We then determined that, out of 3,789 and 738 test casesexamined for Apache and MySQL, respectively, 378 Apachetest cases and 337 MySQL test cases had some test case-specific constraints. This is a good indicator that thesesystems (and potentially other configurable systems) canbenefit from test case-aware covering arrays in practice.

To identify the actual test case-specific constraints forthese configuration-dependent test cases, we studied the



test cases and their oracles, read the user manuals, andconducted experiments as needed. It turned out that the testcase-specific constraints together with the system-wideconstraints needed to build the systems involved a total of13 and 12 unique configuration options for Apache andMySQL, respectively. These configuration options and theirsettings together with the system-wide constraints can befound in Tables 1 and 2.

We observed that the configuration-dependent test casesformed clusters with respect to their test case-specificconstraints. In other words, we had clusters of test caseshaving exactly the same constraints. We identified 17 clus-ters for Apache and 30 clusters for MySQL. Tables 3 and 4present the distribution of the test cases over the clusters,each of which is identified by a unique test case-specificconstraint. The constraints in these tables express theconditions that must be satisfied for the test cases to run.For example, the first row in Table 4 depicts that 86 MySQLtest cases share exactly the same constraint:

log-bin ¼ enable ^ sql-mode 6¼ ansi,

indicating that these test cases skip all configurations inwhich log-bin 6¼ enable or sql-mode ¼ ansi.

An in-depth analysis revealed that these clusters were

formed due to two factors. First, there were some test cases

specifically designed to test features that need to be

explicitly configured into the system. For example, all 16

of the test cases in cluster 6 (Table 3) were designed to test

the distributed authoring and versioning feature of the

Apache server. Therefore, these test cases simply assume

that DAV is already configured into the system (i.e.,

dav ¼ enable). If not, they refuse to run.The second factor was that some test cases originally

designed to test a wide range of features were dependent on

the same configurable feature. For example, a number of

test cases in MySQL used the SQL ’LIKE’ operator in their

database queries with wildcard characters that are not

supported by the ANSI SQL standard. Consequently, these

test cases refuse to run when the system is configured with

the ANSI SQL query engine (i.e., sql-mode ¼ ansi).

5.1.3 Discussion

Considering the nature of the factors leading to having

clusters of test cases sharing exactly the same constraints, it

is likely that other configurable systems exhibit the same

tendency. This is a valuable observation toward improving

the scalability of test case-aware covering array generators

in practice. One test case can be chosen from each cluster

and the test case-aware covering array can be created only

for the selected test cases rather than for all test cases. Once

the test case-aware covering array is computed, each test

case in the array can then be replaced with all the test cases

in the respective cluster. As the number of t-pairs is

proportional to the number of test cases, the proposed

approach can considerably reduce the number of t-pairs

that need to be dealt with at runtime, thus improving

scalability. We employed this approach to compute the test

case-aware covering arrays studied in Section 5.3.


TABLE 2Traditional Configuration Space Model Used

in the Experiments for MySQL

1 This constraint was added to avoid some failures we consistentlyexperienced during the experiments. Although the failures appeared tobe platform-specific, to carry out the experiments we opted to expressthe constraint as a system-wide constraint.

TABLE 3Test Case-Specific Constraints Used

in the Experiments for Apache

TABLE 1Traditional Configuration Space Model Used

in the Experiments for Apache

5.2 Study 2: Masking Effects in Traditional CoveringArrays

In the second study, our goal was to quantify the extent towhich traditional covering arrays suffer from ignoring testcase-specific constraints.

5.2.1 Study Setup

To conduct the study, we mimicked the way traditionalcovering arrays are used in practice. In particular, given aconfiguration space model, we created traditional coveringarrays, scheduled all the test cases to execute in all the selectedconfigurations, and quantified the consequences of maskingeffects caused by the overlooked test case-specific constraints.

We first started with the configuration space models givenin Tables 1 and 2. These models consist of only thoseconfiguration options that are referenced by the system-wideor test case-specific constraints. In the remainder of the paper,a configuration option that is referenced by a constraint isreferred to as a constrained option, e.g., all the options in ourinitial configuration models were constrained options.

We then varied the percentage of constrained options inconfiguration space models (in short, cop) to evaluate theconsequence of cop on masking effects. Given a value of cop,our initial configuration models were augmented withunconstrained binary configuration options to obtain thedesired percentage level. For example, to obtain 60 percentconstrained options (i.e., cop ¼ 60) for Apache, nine un-constrained binary options were added to the initialconfiguration space model given in Table 1. In particular,we experimented with cop ¼ 20, 30, 40, 50, 60, 80, and 100.

Given a configuration space model, we created tradi-tional t-way covering arrays with varying strengths (2 �t � 5). This was done by using a well-known covering arraygenerator, called ACTS (v1.r9.3.2) [1]. Once a t-way cover-ing array was generated, all 378 of the test cases for Apacheand 337 test cases for MySQL which are known to have thetest case-specific constraints given in Tables 3 and 4 werescheduled to be executed in all the selected configurations.Test cases with no test case-specific constraints were notutilized in the experiments as they would not suffer fromany masking effects.

To quantify the harmful consequences of maskingeffects, we use a metric, called t-masked [15]. This metriccounts the number of unique t-pairs that are for sure nottested due to the overlooked test case-specific constraints. Ifa test case skips a configuration, none of the t-tuples in theconfiguration will be tested by that test case. To computethe value of t-masked, for each test case, we first count thenumber of valid t-tuples that appear only in the configura-tions skipped by the test case. We then add these counts upfor all the test cases. For a given test case, the set of validt-tuples, as well as the configurations skipped, weredetermined by using the respective system-wide constraintsin Tables 1 and 2 and the respective test case-specificconstraints in Tables 3 and 4.

We, furthermore, compute the percentage of valid t-pairsthat are masked and refer to it as the t-masked percentage. Forinstance, the value of 3-masked and 3-masked percentage inthe traditional 3-way covering array given in Fig. 1 is 8 and11, respectively. For adequate testing, t-masked, and thus t-masked percentage, should be 0.


Tables 5 and 6 in Appendix A present the results weobtained. The columns in these tables depict the strengthof the covering array generated (i.e., t), the number ofoptions in the configuration space model used, thepercentage of constrained options in the model (i.e., cop),the number of configurations included in the computedcovering array, the number of test runs required by thearray, the time it took to construct the array (in minutes),and the values of t-masked and t-masked percentage forvarious values of t. For each row of the tables, we createdbetween 2 and 10 traditional covering arrays, depending


TABLE 4Test Case-Specific Constraints Used

in the Experiments for MySQL

on the computational resources required. The values ofindependent variables reported in these tables are theaverage values obtained from these arrays. Fig. 6 visualizesthe results we obtained.

We first observed that all the traditional t-way coveringarrays created for the study suffered from masking effectscaused by the overlooked test case-specific constraints(Fig. 6 and Tables 5 and 6). For example, when t ¼ 2 andcop ¼ 100, about 35 percent of all the valid 2-pairs, onaverage, were not tested at all by the traditional 2-waycovering arrays created for Apache, i.e., about 35,250 2-pairswere masked, on average.

We then observed that, for a fixed configuration spacemodel, higher strength covering arrays suffered relativelyless compared to lower strength covering arrays. As tincreased, t-masked percentage decreased, but neverreached 0, when 2 � t � 5 (Fig. 6). For instance, when cop ¼100 for Apache, 5-masked percentage was about 3.6 in5-way covering arrays as opposed to the 2-maskedpercentage of about 35 in 2-way covering arrays. Clearly,when t ¼ n, where n is the number of options in theconfiguration space model, as the configuration space willbe tested exhaustively, t-masked percentage will be 0.However, since t n in practice, with t ¼ 2 being the mostcommon case, handling test case-specific constraints gainsin importance.

We furthermore observed that, for a fixed value of t, asthe constrained options percentage (i.e., cop) decreased, thet-masked percentage tended to decrease (Fig. 6). The trendis more clearly observable when t > 2. Clearly, whencop ¼ 0, as there will be no test case-specific constraints,the t-masked percentage will be 0. However, the t-maskedpercentage never reached 0 for the values of cop used in theexperiments.

One way to remove masking effects can be to create ahigher strength covering array in order to cover lower orderinteractions among configuration options, i.e., using a t-way

covering array to obtain t0-way coverage, where t0 < t [15].For example, using a 3-way covering array is likely toreduce the number of 2-pairs masked compared to using a2-way covering array, as every 2-tuple will appear multipletimes in different 3-tuples.

To evaluate this conjecture, we used the traditional t-waycovering arrays created in this study, but rather thanmonitoring t-masked percentages in these arrays, wemonitored 2-masked (when t 2) and 3-masked percen-tages (when t 3). Fig. 7 visualizes the results we obtained.The raw data can be found in Tables 5 and 6.

We observed that, for a fixed configuration space model,as t increased, both 2-masked and 3-masked percentagesdecreased, and rarely reached 0. For example, when cop ¼100 for Apache, 2-masked percentages were 35.11, 6.22,0.002, and finally 0 for t ¼ 2, 3, 4, and 5, respectively.

Clearly, using a sufficiently high strength traditionalt-way covering array can prevent all t0-pairs from beingmasked (t0 < t). However, the traditional t-way coveringarrays used in the experiments managed to do so at the costof significantly increased number of configurations and testruns compared to the t0-way test case-aware coveringarrays. For instance, when cop ¼ 100 for Apache, thetraditional 5-way covering arrays, while preventing all2-pairs from being masked, required about six times(684 percent) and 18 times (1,888 percent) more configura-tions and test runs, respectively, compared to the 2-way testcase-aware covering arrays created by Algorithm 2, whichalso prevented all 2-pairs from being masked (Table 7 inAppendix B). Further discussion is provided in Section 5.3.

5.2.3 Discussion

In the experiments, we used only those test cases that hadsome test case-specific constraints. In the remainder of thepaper, such test cases are referred to as constrained test cases,whereas test cases with no test case-specific constraints arereferred to as unconstrained test cases.

Had we used some unconstrained test cases in ourexperiments, regardless of the number of such test casesused, t-masked values observed in the experiments wouldnot have changed, i.e., the same number of t-pairs wouldhave been masked. This is because test cases are indepen-dent of each other and unconstrained test cases do not sufferfrom any masking effects. On the other hand, t-maskedpercentages observed in the experiments would havedecreased as the percentage of constrained test cases in thetest suite decreased. This is because adding unconstrainedtest cases increases the total number of t-pairs to be coveredwithout affecting the number of t-pairs masked; thedividend stays the same, but the divisor increases.

As an example, consider the traditional 2-way coveringarrays created for MySQL with cop ¼ 100. When we used337 constrained MySQL test cases in the experiments (i.e.,when 100 percent of the test cases were constrained), out of88,328 valid 2-pairs about 20,483 2-pairs were masked;2-masked percentage was about 23. On the other hand, Fig. 8plots 2-masked percentages that were obtained by varyingthe percentage of constrained test cases in the test suite. Thevariations in the test suite were obtained by keeping the337 constrained test cases in the suite and adding newunconstrained test cases as needed. Regardless of the


Fig. 6. Masking effects in the traditional covering arrays created forStudy 2.

number of unconstrained test cases added, the 2-maskedvalue was always 20,483. However, adding a new uncon-strained test case increased the total number of valid 2-pairsby 307. Thus, 2-masked percentage decreased accordingly.

Note that, since t-masked values do not depend on thenumber of unconstrained test cases, all the comparativeanalyses conducted in Section 5.2.2 are valid regardless ofthe presence or absence of unconstrained test cases.However, the real values of t-masked percentages mayvary. In the analyses, we opted to use t-masked percentagesrather than t-masked values since, being a normalizedmetric, t-masked percentage offers a better interpretation ofthe results. For example, for a fixed value of cop, when t

increased (2 � t � 5), t-masked values increased, but, as thetotal number of valid t-pairs increased at a faster rate,

t-masked percentages decreased, indicating that higherstrength covering arrays actually suffered less.

Overall, our analysis results suggest that the extent towhich traditional covering arrays suffer from maskingeffects depends on many factors, such as the strength of thecovering array, the percentage of constrained options in theconfiguration space model, and the percentage of con-strained test cases in the test suite. Therefore, given aparticular combinatorial testing scenario, reliably estimat-ing the consequences of masking effects caused by theoverlooked test case-specific constraints is of practicalvalue. To do that without ever configuring the systemunder test and without ever running any test cases, we givethe following guidelines to the users of covering arrays:Generate a traditional t-way covering array for the scenarioat hand and then compute the t-masked and t-maskedpercentage values. The larger the value of t-masked and/orthe value of t-masked percentage, the more the coveringarray suffers. Both t-masked and t-masked percentagevalues are important for the evaluation. For example, whent ¼ 5 and cop ¼ 30 for MySQL, 5-masked percentage was1.7. Although this may seem to be a small percentage, itindicates that quite a large number of 5-pairs, i.e., more than120 million 5-pairs, were not tested at all. The evaluationresults can then be used to decide if test case-awarecovering arrays are required for the scenario at hand.

5.3 Study 3: Test Case-Aware Covering Arrays

In the third study, our goal was to evaluate the effectivenessand efficiency of test case-aware covering arrays inavoiding masking effects.

5.3.1 Study Setup

To carry out the study, we first implemented the algorithmsgiven in Section 4. In the implementation of Algorithms 1and 3, we used ACTS v1.r9.3.2 [1] as our traditional coveringarray generator. To implement Algorithm 2, we used gringov3.0.3, a grounder for ASP programs, together with claspv2.0.2, an ASP solver. The solver was configured to have a


Fig. 7. Using t-way traditional covering arrays to reduce (a) 2-masked percentages (when t 2) and (b) 3-masked percentages (when t 3).

Fig. 8. Varying the percentage of constrained test cases in theconfiguration space model of MySQL (t ¼ 2 and cop ¼ 100).

3-minute timeout period, i.e., at each iteration of thealgorithm, the best row found in 3 minutes was returned.

We then populated the traditional configuration spacemodels used in Study 2 with the test case-specificconstraints identified in Study 1. For a given configurationspace model and a value of t, we created between 2 and 10t-way test case-aware covering arrays (2 � t � 3), depend-ing on the computational resources required. The valuesreported are the average values obtained from these arrays.

Furthermore, to evaluate the cost of using test case-awarecovering arrays, we in this section use a cost function. Inpractice, typically, once a covering array is constructed, it isrepeatedly used for testing as long as the underlyingconfiguration space model stays the same. Therefore, wedefine our cost function as follows:

cost ¼ cca þMðNcacc þRcacrÞ; ð1Þ

where cca, cc, and cr are the cost of computing the coveringarray ca (either a test case-aware or a traditional coveringarray), the average cost of configuring the system under testwith a given configuration, and the average cost of runninga single test case, respectively. Furthermore, Nca and Rca arethe number of configurations and the number of test runsrequired by the covering array ca, whereas M is the numberof times the array is used for testing.

For a given configuration space model, when the goal isto minimize the number of configurations (i.e., when cr � 0,which is implicitly assumed by this optimization criterion),a test case-aware covering array tca is more cost-effectivethan a traditional covering array ca when

ctca þMNtcacc < cca þMNcacc: ð2Þ

Assuming that ctca > cca, but Ntca < Nca, we obtain

ctca � ccaccðNca �NtcaÞ

< M: ð3Þ

Similarly, when the goal is to minimize the number oftest runs (i.e., when cc � 0, which is implicitly assumed bythis optimization criterion), tca is more cost-effective thanca when

ctca � ccacrðRca �RtcaÞ

< M; ð4Þ

assuming that ctca > cca, but Rtca < Rca.We refer to the minimum M value satisfying inequality

(3) or inequality (4) as M. M is the break-even point—thatis, the number of times testing is to be performed, where theoverall costs of testing with the test case-aware coveringarray tca are equal to the overall costs of testing with thetraditional covering array ca. If testing is to be performed onmore occasions, testing with the test case-aware coveringarray will be more cost effective. The lower the value of M,the broader the range of circumstances in which the testcase-aware covering array is more cost effective; if it is lessthan 1, the test case-aware covering array is always themore cost-effective choice.

In the experiments, we computed the cc and cr costs for oursubject applications by building the subject applications withtheir default configurations and then executing the respec-tive test suites. We opted to use the default configurations for

several reasons. First, since both subject applications havea large configuration space, it was infeasible for us to testthe entire configuration space to compute the actualaverage costs. Second, we observed that a large portionof the features that are costly to build and to test were notincluded in the default configurations. This prevented us,to the extent possible, from overestimating the costs. Itturned out that, for Apache, cc ¼ 8 minutes for buildingthe entire software suite and cr ¼ 0:00058 minutes, and, forMySQL, cc ¼ 82 minutes for building the entire softwaresuite and cr ¼ 0:055 minutes.


We first compared the test case-aware covering arrayscreated by our algorithms, with each other. Fig. 9 visualizesthe results we obtained. In these figures, the horizontal axisdenotes configuration space models. The literals A and Q

refer to the configuration space models of Apache andMySQL, respectively. The superscript numbers depict thenumber of configuration options used in the models. Thevertical axis denotes the number of configurations inFigs. 9a and 9b and the number of test runs in Figs. 9cand 9d. The raw data can be found in Table 7 and 8 inAppendix B.

Algorithms 1 and 2 aim to minimize the number ofconfigurations. Comparing these algorithms (Figs. 9a and9b), we observed that having a global view of t-pairs (i.e.,Algorithm 2) performed better than having a partial view(i.e., Algorithm 1). Algorithm 2, compared to Algorithm 1,reduced the number of configurations by 54 percent whent ¼ 2 and by 45 percent when t ¼ 3, on average.

Algorithm 2, however, provided these improvements atthe cost of increased construction time. For instance, whencop ¼ 20 for Apache, our implementation of Algorithm 1took 291 minutes on average to create a 3-way test case-aware covering array, whereas that of Algorithm 2 took1,476 minutes, but reduced the number of configurations by47 percent. Therefore, the total cost depends on the actualcost of testing, e.g., 47 percent reductions in the number ofconfigurations may or may not compensate for the increasein the construction time.

We conjecture, however, that in regression testingscenarios, where the same covering array is repeatedlyused for testing, the increase in the construction cost islikely to be compensated for by the decrease in the actualcost of testing. Therefore, Algorithm 2 is of interest,especially in regression testing scenarios. For example, foreach subject application, using inequality (3) revealed that,in 13 out of 14 comparisons (7 values of cop� 2 values of t)made between Algorithms 1 and 2, Algorithm 2 was morecost effective than Algorithm 1 after only a single use of thecomputed test case-aware covering arrays, i.e., M < 1. Inthe remaining comparison, 1 < M < 2 for both subjectapplications, requiring using the computed array at leasttwice for Algorithm 2 to become more cost effective.

Algorithm 3, on the other hand, aims to minimize thenumber of test runs. Comparing Algorithm 3 withAlgorithms 1 and 2, we observed that Algorithm 3 reducedthe number of test runs by 20 percent when t ¼ 2 and by21 percent when t ¼ 3 (Figs. 9c and 9d), on average, whileincreasing the number of configurations by 383 percent


and by 400 percent, respectively (Figs. 9a and 9b).Although there is a profound difference between thepercentage reduction in the number of test runs and thepercentage reduction in the number of configurations,reducing the number of test runs, thus Algorithm 3 is stillof great importance when the cost of configuring thesystem is negligible.

We then compared t-way test case-aware covering arrayswith traditional t-way covering arrays. When the goal wasto minimize the number of test runs, we observed that thet-way test case-aware covering arrays created by Algo-rithm 3 (i.e., our best performing algorithm for reducing thenumber of test runs) removed all masking effects by usingfewer or a comparable number of test runs compared to thetraditional t-way covering arrays. The t-way test case-awarecovering arrays computed by Algorithm 3, except for twocases, which required 0.6 and 0.2 percent increase in the test

runs, reduced the number of test runs by 11 percent whent ¼ 2 and by 17 percent when t ¼ 3, on average, comparedto the traditional t-way covering arrays.

These test case-aware covering arrays, however, tookmore time to construct than the traditional covering arrays.Comparing the total costs by using inequality (4), weobserved that, for MySQL, dMe was 1 in 12 out of 14comparisons and 2 in the remaining 2 comparisons (Fig. 10).For Apache, except for the two cases where the test case-aware covering arrays required slightly more test runs, dMewas less than 18 in 9 out of 12 comparisons (min: 1, avg: 33,and max: 215). One reason we observed high M values forApache was the small cr cost (i.e., cr ¼ 35 milliseconds) onthis subject application, which is not aligned well with thecost model Algorithm 3 is designed for. Had we had cr ¼ 1second, for example, dMe would have been 1 in 9comparisons, 2 in 2 comparisons, and 8 in 1 comparison.


Fig. 9. Comparing the number of configurations (a), (b) and test runs (c), (d) required by the test case-aware covering arrays.

On the other hand, when the goal was to minimize the

number of configurations, the t-way test case-aware cover-

ing arrays required more configurations than the traditional

t-way covering arrays. For example, the t-way test case-

aware covering arrays created by Algorithm 2 required

158 percent more configurations when t ¼ 2 and 174 percent

more configurations when t ¼ 3, on average, than the

traditional t-way covering arrays. As discussed in Section 3,

removing masking effects is likely to increase the number of

configurations required, as the t-pairs being masked in

traditional covering arrays may need to be covered in

additional configurations. Therefore, to evaluate the effi-

ciency provided by test case-aware covering arrays, we

compared them to traditional higher strength covering

arrays—the best candidate that we know of to compare ourresults. In particular, we compared the t0-way test case-aware covering arrays with the traditional t-way coveringarrays that prevented all or almost all t0-pairs from beingmasked, where t0 < t.

To carry out the analysis, among all the traditionalcovering arrays created in Study 2, we first picked the oneswith a 2-masked (t0 ¼ 2) or 3-masked (t0 ¼ 3) percentage ofless than 1 and determined the configuration space modelsused to create these traditional covering arrays. We thenused our algorithms to create 2-way and 3-way test case-aware covering arrays for the same configuration spacemodels, depending on the value of t0. Finally, we comparedthe sizes of the traditional covering arrays to those of thetest case-aware covering arrays.

Fig. 11 presents the percentage reductions obtained bythe 2-way and 3-way test case-aware covering arrays. In thisfigure, the horizontal axis depicts the traditional coveringarrays used in the comparisons. The first and the seconditem in the parentheses indicate the value of t and theconfiguration space model used, respectively. The verticalaxis denotes the percentage reductions obtained. Thesymbol “ut” represents the percentage reductions in thenumber of configurations provided by Algorithm 2 (i.e., ourbest performing algorithm for reducing the number ofconfigurations), whereas the symbol “ ” represents thepercentage reductions in the number of test runs providedby Algorithm 3 (i.e., our best performing algorithm forreducing the number of test runs).

For instance, when t0 ¼ 2, the first tick on the horizontalaxis in Fig. 11a indicates that the traditional 4-way coveringarrays (t ¼ 4) created for the configuration space modelcontaining 13 Apache options performed well in preventing2-pairs from being masked; less than 1 percent of all valid 2-pairs (more accurately, 0.002 percent of the 2-pairs) weremasked, on average (Table 5). These traditional 4-waycovering arrays required 82.40 configurations and


Fig. 10. Comparing the cost of using t-way test case-aware coveringarrays created by Algorithm 3 to that of traditional t-way covering arrays.

Fig. 11. Comparing (a) 2-way (b) 3-way test case-aware covering arrays with higher strength traditional covering arrays.

31,147.20 test runs, on average. On the other hand, when thegoal was to minimize the number of configurations, our2-way test case-aware covering arrays created by Algo-rithm 2 for the same set of configuration options required25.50 configurations on average (Table 7). Similarly, whenthe goal was to minimize the number of test runs, our 2-waytest case-aware covering arrays created by Algorithm 3required 3,449.60 test runs on average (Table 7). That is,compared to the traditional 4-way covering arrays, whichprevented almost all (but not all) 2-pairs from being masked,Algorithm 2 reduced the number of configurations by69 percent and Algorithm 3 reduced the number of testruns by 89 percent, while preventing all 2-pairs from beingmasked.

Overall, compared to the higher strength traditionalcovering arrays used in Fig. 11, the 2-way and 3-way testcase-aware covering arrays created by Algorithm 2, whilenot suffering from any masking effects, reduced thenumber of configurations by 82 percent and by 50 percent,on average, respectively. Similarly, the 2-way and 3-waytest case-aware covering arrays created by Algorithm 3,while not suffering from any masking effects, reduced thenumber of test runs by 94 percent and by 84 percent, onaverage, respectively.

However, in almost all the comparisons made in Fig. 11,constructing the test case-aware covering arrays requiredmore time than constructing the respective higher strengthtraditional covering arrays. Comparing the total costs byusing inequalities (3) and (4), we observed that, when thegoal was to minimize the number of test runs, in 87 percent(41 out of 47) of the comparisons, Algorithm 3 was morecost-effective than the traditional higher strength coveringarrays after only a single use of the arrays, i.e., M < 1(Fig. 12). For the rest of the comparisons, the maximumvalue of M was 13.46. When the goal was to minimize thenumber of configurations, in 89 percent (42 out of 47) ofthe comparisons, Algorithm 2 required only a single use ofthe generated test case-aware covering arrays to become

more cost-effective than the traditional covering arrays. Forthe rest of the comparisons, the maximum value of M was11.52.

To interpret the M values obtained in this study,consider a daily build scenario where the same coveringarray is used once a day for smoke testing the configura-tion spaces of our subject applications. Had our test case-aware covering arrays been used in such a scenario, they,in the worst case, would have compensated for theirconstruction cost in 14 days, compared to the higherstrength covering arrays. We conjecture that, as theconfiguration space models of our subject applicationsare highly unlikely to change in every 14 days (they havestable code bases), the test case-aware covering arrayswould have been more cost-effective than their counter-parts. Furthermore, M is inversely proportional to cc and cr(inequalities (3) and (4)). Therefore, the greater the cost ofconfiguring the system and/or the greater the cost ofrunning the test cases, the faster test case-aware coveringarrays compensate for their construction cost.

5.3.3 Discussion

In any case, reducing the construction cost of test case-aware covering arrays is still of practical concern. Asdiscussed in Section 7, there are four main categories ofmethods to compute traditional covering arrays: randomsearch-based methods [28], heuristic search-based methods[7], [10], [12], [19], [29], mathematical methods [20], [21],[32], [33], and greedy methods [4], [6], [8], [9], [13], [23], [30],[31]. We conjecture that all of these methods can also beused to compute test case-aware covering arrays. However,we opted to leverage only the greedy algorithmic paradigmdue to the relative ease of its application. Furthermore, allthe algorithms presented in this work as well as theirimplementations are developed mainly for correctness, notfor runtime performance. As a future work, we plan tostudy alternative construction methods, such as heuristicsearch-based methods. For the time being, we present some


Fig. 12. Comparing the cost of using (a) 2-way (b) 3-way test case-aware covering arrays to that of higher strength traditional covering arrays.

consideration as to how the current implementation of ouralgorithms can be sped up.

Algorithms 1 and 3 use traditional covering arrayconstruction as a computational primitive. In particular,both algorithms compute a traditional covering array foreach test cluster rather than computing a single array.Consequently, as the performance of traditional coveringarray generators improves, the runtime performance ofAlgorithms 1 and 3 will improve proportionally. To furtherimprove the performance, the traditional covering arraysrequired by these algorithms can be computed in a parallelmanner. Parallelizing Algorithm 3 is easier than parallelizingAlgorithm 1. As there are no dependencies among thetraditional covering arrays generated by Algorithm 3, all ofthese arrays can be generated in parallel. Algorithm 1, on theother hand, requires more consideration as the traditionalcovering arrays to be computed depend on the previouslycomputed arrays via the seeding mechanism. One approachcan be to group the test clusters and generate the requiredtraditional covering arrays in parallel within each group.After processing a group, the current test case-aware cover-ing array at hand can then be used as a seed for the nextgroup. While the running time of this approach increaseslinearly with the number of groups rather than with thenumber of test clusters, the approach is likely to require moreconfigurations than the original approach as the test caseswithin a group will be less likely to share configurations. Oneapproach to reducing the impact of this shortcoming can beto construct the groups such that test cases that are less likelyto share configurations, e.g., the ones that have conflictingconstraints, are placed in the same group.

Algorithm 2, on the other hand, does not depend ontraditional covering array generators. In the implementationof this algorithm, we used a sequential ASP solver to selectthe best row at each iteration of the algorithm (Section 4.2).One approach to speeding up the computation is to use aparallel ASP solver [17] instead of a sequential one.Furthermore, in our experiments, we observed that the2-way test case-aware covering arrays tended to cover a largepercentage of the 3-pairs. For example, in the 2-way test case-aware covering arrays created by Algorithm 2, 91 percent ofthe 3-pairs (max: 94 percent and min: 89 percent) for Apacheand 92 percent of the 3-pairs (max: 96 percent and min:88 percent) for MySQL, on average, were already covered.Based on this observation, another approach to improve theruntime performance of Algorithm 2 can be to follow anincremental construction strategy. For instance, to compute at-way test case-aware covering array, a ðt� 1Þ-way test case-aware covering array can be constructed first and then onlythose t-pairs which are not covered by the ðt� 1Þ-way arraycan be fed to Algorithm 2 to compute the additional rows asneeded. As this approach reduces the number of t-pairs to beplaced by Algorithm 2, the runtime performance of thisalgorithm will be likely to be improved. The same approachcan also be used in a recursive manner, e.g., the ðt� 1Þ-waytest case-aware covering array can be constructed from aðt� 2Þ-way array, and so on. This approach, however, canincrease the number of configurations required compared tothe original approach.

Another way to reduce the construction cost is throughsupport for efficient handling of simple, incrementalchanges in the configuration model. To this end, our

algorithms need to be slightly modified so that they alsotake as input an initial seed S0. In our context, S0 is a set ofconfiguration-test cases pairs. The initial seed does notnecessarily constitute a test case-aware covering array. We,however, assume that the seed does not violate anyconstraints. Given an initial seed, Algorithms 1 and 3 needto compute an initial seed for each test case. For Algorithm 3,the initial seed S0� of a test case � is the set of configurationsin S0 in which the test case has already been scheduled toexecute. For Algorithm 1, S0� also contains those remainingconfigurations in S0 which do not violate the test case-specific constraint of the test case. Once the initial seeds forthe test cases are computed, line 3 in both algorithms needsto be changed such that the seed S� computed for a test case� by the algorithms is merged with the initial seed S0� of thetest case. Given an initial seed, Algorithm 2, on the otherhand, marks all the t-pairs appearing in the seed as coveredand the main loop of the algorithm is executed only forthose t-pairs yet to be covered.

Once the seeding mechanism is in place, changes madeto the configuration space model can be handled as follows:To add a new test case, if the test case falls in an existing testcluster, then schedule the test case with the rest of the testcases in the cluster. Otherwise (i.e., when the test case formsa new test cluster), use the current test case-aware coveringarray as an initial seed and execute the main loops of thealgorithms only for the newly added test case. In Algo-rithm 2, the test case first needs to be scheduled to executein all of its valid configurations included in the seed. Toremove a test case, remove the test case from the current testcase-aware covering array together with the configurationsthat become redundant after the removal of the test case.Changes in the test case-specific constraint of a test case canthen be handled by removing the test case first and thenadding the same test case with the new constraints. To add,remove, or modify a system-wide constraint, remove all theinvalidated configurations (if any) and the respective testruns from the current test case-aware covering array, anduse the remaining array as an initial seed.

Populating the configuration space model with a newconfiguration option, on the other hand, requires moreconsideration. In particular, we need to handle seeds withpartially filled configurations. A partially filled configura-tion is a configuration in which some option settings are leftunset. When a new option is added, the configurations inthe current test case-aware covering arrays are converted topartially filled configurations by adding the option to everyconfiguration without a predetermined setting. Given apartially filled seed, Algorithm 2 needs to iterate over all theconfiguration-test cases pairs and, for each partially filledconfiguration, determine the “best” setting for the newlyadded option, given the test cases that have already beenscheduled to execute in the configuration. The ASPencoding given in Fig. 3 can easily be modified for thispurpose. The resulting array can then be used as a fullyfilled initial seed. Modifying Algorithm 1 is harder. SinceAlgorithm 1 processes the t-pairs on a per-test case basis,making it handle more than one test case at once is nottrivial. Therefore, when adding a new configuration option,Algorithm 1 can make use of Algorithm 2 to address thechange. Algorithm 3, however, handles partially filled seedswithout any modifications as long as the traditonal covering


array generator used in the implementation handles them.Finally, removal of a configuration option from theconfiguration space model can trivially be handled.

One interesting observation we make from Fig. 9 is that,although Algorithms 1 and 2 aim to minimize the number ofconfigurations without making any attempts to reduce thenumber of test runs required, Algorithm 2, compared toAlgorithm 1, required fewer test runs for Apache, but moretest runs for MySQL (Figs. 9c and 9d), while requiring fewerconfigurations than Algorithm 1 for both subject applica-tions (Figs. 9a and 9b). That is, for both subject applications,while Algorithm 2 covered more t-pairs per configuration onaverage, it covered more t-tuples per test run for Apache, butfewer t-tuples per test run for MySQL, on average.

We believe that this is mainly due to the differencesbetween the test suites of our subject applications. First,Apache had fewer test clusters than MySQL; 17 test clustersversus 30 test clusters (Tables 3 and 4). Note that each testcluster is associated with a unique test case-specificconstraint. Second, the test case-specific constraints ofApache involved fewer configuration options per constraintthan those of MySQL. For example, the percentage of testcase-specific constraints that involved only one option was71 for Apache and 20 for MySQL. Therefore, when pickinga configuration and the associated set of test cases,Algorithm 2 had fewer constraints, and thus more choices,to consider for Apache, but more constraints, and thusfewer choices, for MySQL.

We observed that in heavily constrained configurationspaces, the more the test cases are forced to shareconfigurations, the more likely it is that fewer uniquet-tuples will be covered per scheduled test run (increasingthe total number of test runs required). This is becausewhen picking a configuration in a heavily constrainedconfiguration space, a large portion of the configuration islikely to be dictated by the constraints of few test cases,which can reduce the number of unique t-tuples covered forthe rest of the test cases that are scheduled to be executed inthe same configuration. For example, when t ¼ 2 and cop ¼30 for Apache, Algorithm 2 covered 6.32 unique 2-tuplesper test run, whereas Algorithm 1 covered 4.15 unique2-tuples per test run, on average. In the end, Algorithm 2required 13 percent fewer test runs than Algorithm 1.However, when t ¼ 2 and cop ¼ 30 for MySQL, Algorithm 2required 7 percent more test runs than Algorithm 1. For thisscenario, Algorithm 2 covered 2.23 unique 2-tuples per testrun, whereas Algorithm 1 covered 4.08 unique 2-tuples pertest run, on average. In both cases, Algorithm 2 requiredfewer configurations than Algorithm 1. Similar trends wereobserved in the rest of the experiments.

6 THREATS TO VALIDITY

All empirical studies suffer from threats to their internaland external validity. For this work, we were primarilyconcerned with threats to external validity since they limitour ability to generalize the results of our studies toindustrial practice.

One potential threat is that the proposed approachassumes that all test case-specific constraints are knowna priori. In the presence of missing or incorrect constraints,as test cases can still skip some configurations due tounsatisfied constraints, the test case-aware covering arrays

may suffer from masking effects. In such cases, the feedbackdriven adaptive combinational testing process we intro-duced in a prior work [15] can be used to iteratively detectand remove masking effects.

Another potential threat is that we have only studied twosoftware systems; Apache and MySQL. However, bothApache and MySQL are widely used nontrivial applicationswith large configuration spaces and both have been used inother related works in the literature [15], [18]. A relatedthreat concerns the representativeness of the configurationspace models and the test suites used in the experiments.Although these configuration space models and test suiteswere culled from the actual configuration space models andtest suites of our subject applications, they only representtwo sets of data points. To reduce the threats concerning therepresentativeness of the configuration space models, wevaried the percentage of constrained options in the models(Sections 5.2 and 5.3). To reduce the threats concerning therepresentativeness of the test suites, we studied the effect ofvarying the percentage of constrained test cases in the testsuites (Section 5.2).

Another potential threat concerns the representativenessof the traditional covering array generator used in theexperiments, namely, ACTS. However, ACTS is a well-known and widely used generator. We opted to use ACTSsince it offered the best runtime performance among thegenerators we experimented with.

Furthermore, our algorithms construct test case-awarecovering arrays for two cost models. In one cost model, thecost of running the test cases is negligible compared to thatof configuring the system and the configuration cost is thesame for all configurations. In the other cost model, the costof configuring the system is negligible compared to that ofrunning the test cases and the execution cost is the same forall test runs. For other cost models, different algorithmsmay be required to minimize the cost. Furthermore, the ccand cr costs used in our cost function given in Section 5.3.1are application specific. Therefore, M values may vary fromone application scenario to another. To lower the strength ofthe dependency, we presented some consideration as tohow the construction cost of test case-aware covering arrayscan further be reduced (Section 5.3.3). However, we did notexperiment exhaustively with these approaches and leavethis as future work; the runtime performance of theseapproaches has yet to be studied.

Finally, we have not directly evaluated the cost-effec-tiveness of test case-aware covering arrays, i.e., evaluatingthe effectiveness, such as failure-detection capabilities, as afunction of cost, such as total testing time. However, ourempirical results reported in a prior work [15] stronglysuggest that as masking effects are removed, the number offailures observed and the structural code coverage obtainedin testing monotonically increase.

7 RELATED WORK

Covering arrays aim to reveal option-related failures. Theresults of many empirical studies strongly suggest that amajority of option-related failures in practice are caused bythe interactions among only a small number of configurationoptions and that traditional t-way covering arrays, where t ismuch smaller than the number of configuration options, are


an effective and efficient way of revealing such failures [3],

[9], [13], [14].Nie et al. classify the methods for generating covering

arrays into four main categories [26]: random search-based

methods [28], heuristic search-based methods [7], [10], [12],

[19], [29], mathematical methods [20], [21], [32], [33], and

greedy methods [4], [6], [8], [9], [13], [23], [30], [31].Random search-based methods employ a random selec-

tion without replacement strategy [28]. Valid configurationsare randomly selected from the configuration space in aniterative manner until all the required t-tuples have beencovered by the selected configurations.

Heuristic search-based methods, on the other hand,employ heuristic search techniques, such as hill climbing[12], tabu search [7], and simulated annealing [10], or AI-based search techniques, such as genetic algorithms [19]and ant colony algorithms [29]. These methods maintain aset of configurations at any given time and iteratively applya series of transformations to the set until the set constitutesa t-way covering array.

Mathematical methods for constructing covering arrayshave also been studied [21], [32], [33]. Some mathematicalmethods are based on recursive construction methods,which build covering arrays for larger configuration spacemodels (i.e., the ones with a larger number of configurationoptions) by using covering arrays built for smaller config-uration space models [21], [32]. Other mathematicalmethods leverage mathematical programming, such asinteger programming, to compute covering arrays [33].

Greedy algorithms operate in an iterative manner [4], [6],[8], [9], [13], [23], [30], [31]. At each iteration, among the setsof configurations examined as candidates, the one thatcovers the most previously uncovered t-tuples is includedin the covering array. The iterations terminate when all therequired t-tuples have been covered.

The algorithms we present in this work fall into the

category of greedy algorithms. However, while the existing

greedy algorithms compute traditional covering arrays, we

compute test case-aware covering arrays.Handling system-wide interoption constraints in the

construction of traditional covering arrays has also been of

interest. Cohen et al. study the nature of such constraints in

configurable software systems and empirically demonstrate

that ignoring such constraints leads to wasted testing efforts

[11]. Mats et al. propose various techniques to efficiently

handle system-wide constraints [25]. Bryce and Colbourn

introduce the concept of “soft constraints” to mark option

setting combinations that are permitted but undesirable to

be included in a covering array [5].Traditional covering arrays, while handling system-wide

constraints, do not account for test case-specific constraints.

In this work, on the other hand, we take test case-specific

constraints into account when constructing combinatorial

interaction test suites.Seeding mechanisms in CIT approaches have been used

to guarantee the inclusion of certain configurations in

traditional covering arrays [9], [13], [18]. In this paper, we

use the seeding mechanism to construct test case-aware

covering arrays.

8 CONCLUDING REMARKS

The basic justification for traditional t-way covering arraysis that they can cost-effectively exercise all systembehaviors caused by the settings of t or fewer options. Inthis paper, we hypothesize, however, that, in the presenceof test case-specific interoption constraints, as traditionalcovering arrays do not provide a systematic way ofhandling these constraints many such behaviors may notbe tested due to masking effects caused by the overlookedtest case-specific constraints.

To evaluate this hypothesis, we conducted a series ofexperiments on two widely used highly configurable soft-ware systems, namely, Apache and MySQL. We firstobserved that test case-specific constraints do exist inpractice. Out of all the test cases we examined for oursubject applications, 378 Apache test cases and 337 MySQLtest cases had some test case-specific constraints. We thenobserved that traditional covering arrays suffered frommasking effects caused by the overlooked test case-specificconstraints. In a study, for example, 35 percent of allvalid 2-way option setting combination-test case pairs (i.e.,2-pairs), on average, were not tested at all by the traditional2-way covering arrays created. For a fixed configurationspace model, higher strength covering arrays sufferedrelatively less compared to lower strength arrays; as tincreased, t-masked percentage (i.e., the percentage of t-pairs that are masked) decreased. For a fixed value of t, asthe percentage of the configuration options that are refer-enced by a constraint (i.e., constrained options percentage)decreased and/or the percentage of the test cases that havetest case-specific constraints (i.e., constrained test casespercentage) decreased, t-masked percentage decreased.However, t-masked percentage never reached 0 in thetraditional t-way covering arrays created in the experiments.

To account for test case-specific constraints and avoidharmful consequences of masking effects caused by theignorance of such constraints, we first introduced test case-aware covering arrays. We then presented three algorithmsto compute test case-aware covering arrays. A valuableobservation we make is that there is often a tradeoffbetween minimizing the number of configurations andminimizing the number of test runs in test case-awarecovering arrays. Among the algorithms we introduced,Algorithms 1 and 2 aim to minimize the number ofconfigurations, whereas Algorithm 3 aims to minimize thenumber of test runs.

For the configuration space models studied in theexperiments, Algorithm 2 performed better than Algo-rithm 1 in reducing the number of configurations required.Algorithm 2, compared to Algorithm 1, while having ahigher computational complexity, significantly reduced thenumber of configurations by 54 percent when t ¼ 2 and by45 percent when t ¼ 3, on average. Algorithm 3, on theother hand, performed better than Algorithms 1 and 2 inreducing the number of test runs required. Algorithm 3,compared to Algorithms 1 and 2, while requiring moreconfigurations, reduced the number of test runs by20 percent when t ¼ 2 and by 21 percent when t ¼ 3, onaverage. These results make Algorithm 3 of practicalinterest when the cost of configuring the system isnegligible and Algorithm 2 of practical interest when thecost of running the test cases is negligible.


We also compared t-way test case-aware covering arrayswith traditional t-way covering arrays. When the goal wasto minimize the number of test runs, we observed that thet-way test case-aware covering arrays computed by Algo-rithm 3, while not suffering from any masking effects,reduced the number of test runs (except for two cases) by11 percent when t ¼ 2 and by 17 percent when t ¼ 3, on

average, compared to the traditional t-way covering arrays.When the goal was to minimize the number of configura-tions, however, the t-way test case-aware covering arraysexpectedly required more configurations than the tradi-tional t-way covering arrays. To evaluate the efficiencyprovided by t-way test case-aware covering arrays, we thencompared them to traditional higher strength covering


TABLE 5Masking Effects in the Traditional t-way Covering Arrays Created for Apache

TABLE 6Masking Effects in the Traditional t-way Covering Arrays Created for MySQL

arrays that prevented all or almost all t-pairs from beingmasked. We observed that the test case-aware coveringarrays created by Algorithm 2, while not suffering from anymasking effects, reduced the number of configurations by82 percent when t ¼ 2 and by 50 percent when t ¼ 3, onaverage, compared to the higher strength covering arrays.

In almost all the comparisons, constructing the test case-aware covering arrays took more time than constructingtheir counterparts. We, however, observed that the more theconstructed arrays are reused for testing and the greater thecost of configuring the system and running the test cases,the faster test case-aware covering arrays compensate fortheir construction cost.

We furthermore observed that the extent to whichtraditional covering arrays suffer from masking effectsdepends on many factors, such as the strength of thecovering array, the percentage of constrained options in theconfiguration space model, and the percentage of con-strained test cases in the test suite. Therefore, we provided aguideline to the users of covering arrays to reliably estimatethe consequences of masking effects without performing anysystem builds and test runs. This guideline can be summar-ized as follows: Generate a traditional t-way covering array

for the scenario at hand and then compute the t-masked

and t-masked percentage values. The larger the value of

t-masked and/or the value of t-masked percentage, the

more the covering array suffers. The evaluation results can

then be used to decide if test case-aware covering arrays are

required for the scenario at hand.As future work, we plan to work on cost- and test-case

aware covering arrays that support a general cost model in

which the overall cost of testing can be specified at the

granularity of option settings and test cases.

APPENDIX A

MASKING EFFECTS IN TRADITIONAL COVERING

ARRAYS

Tables 5 and 6 quantify the extent to which the traditional

covering arrays created in the experiments suffer from

masking effects caused by the overlooked test case-specific

constraints. When cop ¼ 20 for Apache and MySQL, we

were not able to generate traditional 5-way covering arrays

because of the scalability issues we experienced with the

ACTS tool.


TABLE 7Test Case-Aware Covering Arrays Created for Apache

TABLE 8Test Case-Aware Covering Arrays Created for MySQL

APPENDIX B

COMPARING TEST CASE-AWARE COVERING ARRAYS

Tables 7 and 8 provide some statistics about the test case-

aware covering arrays generated in the experiments.

ACKNOWLEDGMENTS

This research was supported by a Marie Curie International

Reintegration Grant within the Seventh European Commu-

nity Framework Programme (FP7-PEOPLE-IRG-2008), and

by the Scientific and Technological Research Council of

Turkey (109E182).

REFERENCES

[1] Advanced Combinatorial Testing System (ACTS), http://csrc.nist.gov/groups/SNS/acts/documents/comparison-report.html, 2012.

[2] C. Baral, Knowledge Representation, Reasoning, and DeclarativeProblem Solving. Cambridge Univ. Press, 2003.

[3] R. Brownlie, J. Prowse, and M.S. Phadke, “Robust Testing ofAT&T PMX/StarMAIL Using OATS,” AT&T Technical J., vol. 71,no. 3, pp. 41-47, 1992.

[4] R.C. Bryce and C.J. Colbourn, “Constructing Interaction TestSuites with Greedy Algorithms,” Proc. 20thIEEE/ACM Int’l Conf.Automated Software Eng., pp. 440-443, 2005.

[5] R.C. Bryce and C.J. Colbourn, “Prioritized Interaction Testing forPair-Wise Coverage with Seeding and Constraints,” Informationand Software Technology, vol. 48, no. 10, pp. 960-970, 2006.

[6] R.C. Bryce and C.J. Colbourn, “The Density Algorithm forPairwise Interaction Testing: Research Articles,” Software Testingand Verification Reliability, vol. 17, pp. 159-182, Sept. 2007.

[7] R.C. Bryce and C.J. Colbourn, “One-Test-at-a-Time HeuristicSearch for Interaction Test Suites,” Proc. Ninth Ann. Conf. Geneticand Evolutionary Computation, pp. 1082-1089, 2007.

[8] R.C. Bryce and C.J. Colbourn, “A Density-Based Greedy Algo-rithm for Higher Strength Covering Arrays,” Software Testing andVerification Reliability, vol. 19, pp. 37-53, Mar. 2009.

[9] D.M. Cohen, S.R. Dalal, M.L. Fredman, and G.C. Patton, “TheAETG System: An Approach to Testing Based on CombinatorialDesign,” IEEE Trans. Software Eng., vol. 23, no. 7, pp. 437-44, July1997.

[10] M.B. Cohen, C.J. Colbourn, and A.C.H. Ling, “AugmentingSimulated Annealing to Build Interaction Test Suites,” Proc. 14thInt’l Symp. Software Reliability Eng., p. 394, 2003.

[11] M.B. Cohen, M.B. Dwyer, and J. Shi, “Interaction Testing of HighlyConfigurable Systems in the Presence of Constraints,” Proc. Int’lSymp. Software Testing and Analysis, pp. 129-139, 2007.

[12] M.B. Cohen, P.B. Gibbons, W.B. Mugridge, and C.J. Colbourn,“Constructing Test Suites for Interaction Testing,” Proc. 25th Int’lConf. Software Eng., pp. 38-48, 2003.

[13] J. Czerwonka, “Pairwise Testing in the Real World: PracticalExtensions to Test-Case Scenarios,” Proc. 24th Pacific NorthwestSoftware Quality Conf., pp. 285-294, 2006.

[14] S.R. Dalal, A. Jain, N. Karunanithi, J.M. Leaton, C.M. Lott, G.C.Patton, and B.M. Horowitz, “Model-Based Testing in Practice,”Proc. Int’l Conf. Software Eng., pp. 285-294, 1999.

[15] E. Dumlu, C. Yilmaz, M.B. Cohen, and A. Porter, “FeedbackDriven Adaptive Combinatorial Testing,” Proc. Int’l Symp. SoftwareTesting and Analysis, pp. 243-253, 2011.

[16] T. Eiter, G. Ianni, and T. Krennwallner, “Answer SetProgramming: A Primer,” Reasoning Web. Semantic Technologiesfor Information Systems, pp. 40-110, Springer, 2009.

[17] E. Ellguth, M. Gebser, M. Gusowski, B. Kaufmann, R. Kaminski, S.Liske, T. Schaub, L. Schneidenbach, and B. Schnor, “A SimpleDistributed Conflict-Driven Answer Set Solver,” Proc. 10th Int’lConf. Logic Programming and Nonmonotonic Reasoning, pp. 490-495,2009.

[18] S. Fouche, M.B. Cohen, and A. Porter, “Towards IncrementalAdaptive Covering Arrays,” The Sixth Joint Meeting on EuropeanSoftware Eng. Conf. and the ACM SIGSOFT Symp. Foundations ofSoftware Eng.: Companion Papers, pp. 557-560, 2007.

[19] S. Ghazi and M. Ahmed, “Pair-Wise Test Coverage Using GeneticAlgorithms,” Proc. Congress Evolutionary Computation, vol. 2,pp. 1420-1424, Dec. 2003.

[20] A. Hartman, “Software and Hardware Testing Using Combina-torial Covering Suites,” Graph Theory, Combinatorics and Algo-rithms, Operations Research/Computer Science Interfaces Series,M.C. Golumbic and I.B.-A. Hartman, eds., vol. 34, pp. 237-266,Springer, 2005.

[21] N. Kobayashi, “Design and Evaluation of Automatic TestGeneration Strategies for Functional Testing of Software,” PhDthesis, Osaka Univ., 2002.

[22] D. Kuhn, D.R. Wallace, and A.M. Gallo, “Software FaultInteractions and Implications for Software Testing,” IEEE Trans.Soft. Eng., vol. 30, no. 6, pp. 418-421, June 2004.

[23] Y. Lei, R. Kacker, D.R. Kuhn, V. Okun, and J. Lawrence, “Ipog-ipog-d: Efficient Test Generation for Multiway CombinatorialTesting,” Software Testing Verification and Reliability, vol. 18,pp. 125-148, Sept. 2008.

[24] V. Marek and M. Truszczy�nski, “Stable Models and an AlternativeLogic Programming Paradigm,” The Logic Programming Paradigm:A 25-Year Perspective, 1999.

[25] G. Mats, O. Jeff, and M. Jonas, “Handling Constraints in the InputSpace When Using Combination Strategies for Software Testing,”Technical Report HS-IKI-TR-06-001, School of Humanities andInformatics, Univ. of Skvde, 2006.

[26] C. Nie and H. Leung, “A Survey of Combinatorial Testing,” ACMComputing Surveys, vol. 43, pp. 11:1-11:29, Feb. 2011.

[27] I. Niemela, “Logic Programs with Stable Model Semantics as aConstraint Programming Paradigm,” Ann. Math. and ArtificialIntelligence, vol. 25, nos. 3/4, pp. 241-273, 1999.

[28] P.J. Schroeder, P. Bolaki, and V. Gopu, “Comparing the FaultDetection Effectiveness of N-Way and Random Test Suites,” Proc.Int’l Symp. Empirical Software Eng., pp. 49-59, 2004.

[29] T. Shiba, T. Tsuchiya, and T. Kikuno, “Using Artificial LifeTechniques to Generate Test Cases for Combinatorial Testing,”Proc. Ann. Int’l Computer Software and Applications Conf., pp. 72-77,2004.

[30] K.-C. Tai and Y. Lei, “A Test Generation Strategy for PairwiseTesting,” IEEE Trans. Software Eng., vol. 28, no. 1, pp. 109-111, Jan.2002.

[31] Y.-W. Tung and W. Aldiwan, “Automating Test Case Generationfor the New Generation Mission Software System,” Proc. IEEEAerospace Conf., vol. 1, pp. 431-437, 2000.

[32] A.W. Williams, “Determination of Test Configurations for Pair-Wise Interaction Coverage,” Proc. 13th IFIP TC6/WG6.1 Int’l Conf.Testing Communicating Systems: Tools and Techniques, pp. 59-74,2000.

[33] A.W. Williams and R.L. Probert, “Formulation of the InteractionTest Coverage Problem as an Integer Program,” Proc. 14th IFIPInt’l Conf. Testing Communicating Systems, p. 283, 2002.

[34] C. Yilmaz, M.B. Cohen, and A. Porter, “Covering Arrays forEfficient Fault Characterization in Complex ConfigurationSpaces,” IEEE Trans. Software Eng., vol. 31, no. 1, pp. 20-34, Jan.2006.

Cemal Yilmaz received the BS and MS degreesin computer engineering and information sciencefrom Bilkent University, Ankara, Turkey, in 1997and 1999, respectively. In 2005, he received thePhD degree in computer science from theUniversity of Maryland at College Park. Between2005 and 2008, he worked as a postdoctoralresearcher at the IBM Thomas J. WatsonResearch Center, Hawthorne, New York. Cur-rently, he is an assistant professor of computer

science on the Faculty of Engineering and Natural Sciences, SabanciUniversity, Istanbul, Turkey. His current research interests includesoftware engineering and software quality assurance. He received theNational Young Researchers Career Development Grant of theScientific and Technological Research Council of Turkey in 2009.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


test case-aware combinatorial interaction testing

Documents