evolutionary object-oriented testing - uva · 2020-07-15 · evolutionary object-oriented testing...
TRANSCRIPT
Evolutionary Object-Oriented
Testing
Lucas Serpa Silva
Artificial Intelligence
University of Amsterdam
A thesis submitted for the degree of
Msc Artificial Intelligence
Supervised by
Dr. Maarten van Someren
2009, July
Abstract
It is estimated that 80% of software development cost is spent on detecting and
fixing defects. To tackle this issue, a number of tools and testing techniques have
been developed to improve the testing framework. Although techniques such as
static analysis, random testing and evolutionary testing have been used to au-
tomate the testing process, it is not clear what is the best approach. Previous
research on evolutionary testing has mainly focused on procedural programming
languages with simple test data inputs such as numbers. In this work, we present
an evolutionary object-oriented testing approach that combines a genetic algo-
rithm with static analysis to increase the number of faults found within a time
frame. A total of 640 experiments were executed to evaluate the effectiveness of
different genetic algorithms and parameters. The system results are compared to
the results obtained by running a random test case generator for 15, 30 and 60
minutes. The results show that genetic algorithm combined with static analysis
can considerably increse the number of faults found compared to random testing.
In some cases, evolutionary testing found more faults in 15 minutes then a random
testing strategy found in 60 minutes.
Acknowledgements
I would like to thank my supervisor, Maarten van Someren for his support, guid-
ance and constructive comments throughout this work. I would also like to thank
Yi Wei for the various discussions regarding Autotest, code coverage and auto-
mated testing. A special thanks goes to Olga Nikolayeva for many invaluable
suggestions and the time she spent proofreading and reviewing this thesis.
Contents
List of Figures vii
List of Tables ix
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Past research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Project goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 4
2.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Black box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2 White box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3 Automated testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Eiffel & Design by Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Autotest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.1 Chromosome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.3 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.4 Objective and fitness value . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.5 Selecting individuals for reproduction . . . . . . . . . . . . . . . . . . 13
2.4.6 GA Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Evolutionary testing 16
3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
iv
CONTENTS
3.1.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Algorithm stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.2.1 Allele value specification . . . . . . . . . . . . . . . . . . . . 18
3.1.2.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2.4 Mutation and crossover . . . . . . . . . . . . . . . . . . . . . 20
3.2 Evolutionary Autotest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4 Experiments 25
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Experiments Group A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.1 Experiment A1: Autotest parameters . . . . . . . . . . . . . . . . . . 27
4.4 Experiments Group B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4.1 Experiment B1: Mutation probability . . . . . . . . . . . . . . . . . . 29
4.4.2 Experiment B2: Mutation algorithm . . . . . . . . . . . . . . . . . . . 29
4.4.3 Experiment B3: Crossover probability . . . . . . . . . . . . . . . . . . 31
4.4.4 Experiment B4: Crossover algorithm . . . . . . . . . . . . . . . . . . . 33
4.4.5 Experiment B5: Selection method . . . . . . . . . . . . . . . . . . . . 34
4.5 Experiments Group C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5.1 Experiment C1 :Original Autotest . . . . . . . . . . . . . . . . . . . . 36
4.5.2 Experiment C2: Autotest with static analysis . . . . . . . . . . . . . . 38
4.5.3 Experiment C3: Evolutionary testing . . . . . . . . . . . . . . . . . . . 40
5 Discussion 45
5.1 Types of faults found . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.4 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.5 Further improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A Primitive Values 50
B Chromosome specification 51
v
CONTENTS
C Chromosome files 52
Bibliography 53
vi
List of Figures
2.1 Example of Design by Contracttm . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Autotest algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Autotest algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Genetic Algorithm flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Examples of mutation algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 One and two points crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.7 Order crossover examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Four basic components of the system . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Four stages of the genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Parallel population evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Corrupted chromosome caused by crossover . . . . . . . . . . . . . . . . . . . 21
3.5 Valid chromosome crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.6 Evolutionary Autotest 1 - loading chromosome and evolve.conf . . . . . . . . 23
3.7 Evolutionary Autotest 2 - method call . . . . . . . . . . . . . . . . . . . . . . 23
3.8 Evolutionary Autotest 3 - object creation . . . . . . . . . . . . . . . . . . . . 24
4.1 Number of faults found using random and static analysis technique to select
the initial primitive values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Number of faults for mutation algorithms for each class . . . . . . . . . . . . 31
4.3 Effect of mutation and crossover probability on the number of faults . . . . . 32
4.4 Comparison of crossover algorithms . . . . . . . . . . . . . . . . . . . . . . . . 34
4.5 Comparison of selection algorithms . . . . . . . . . . . . . . . . . . . . . . . . 34
4.6 Variation on the total number of faults found . . . . . . . . . . . . . . . . . . 37
4.7 Autotest progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.8 Evolutionary approach on α set . . . . . . . . . . . . . . . . . . . . . . . . . . 41
vii
LIST OF FIGURES
4.9 Evolutionary testing on β set . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.10 Total number of faults found for all classes over time by the three approaches 43
4.11 Evolutionary approach time library . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1 Distribution of the types of faults found in the metalex class . . . . . . . . . . 46
5.2 Usage frequency of each parameter . . . . . . . . . . . . . . . . . . . . . . . . 47
viii
List of Tables
1.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
4.1 Test classes α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2 Test classes β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Genetic algorithm setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Autotest parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Mutation probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Mutation methos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.7 Crossover probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.8 Crossover methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.9 Population selection schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.10 Original Autotest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.11 Original Autotest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.12 Original Autotest executions . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.13 Autotest with static analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.14 Autotest with static analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.15 Time allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.16 Execution setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.17 Evolutionary algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.18 Evolutionary algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.1 Autotest primitive values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
B.1 Chromosome specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
C.1 Chromosome files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
ix
1
Introduction
1.1 Motivation
In the past 50 years the growing influence of software in all areas of industry lead to an ever-
increasing demand for complex and reliable software. According to a study(3) conducted by
the National Institute of Standard & Technology, approximately 80% of the development cost
is spent on identifying and correcting defects. The same study found that software bugs cost
the United States economy around $59.5 billion a year, with one third of this value being
attributed to the poor software testing infrastructure. In the effort to improve the existing
testing infrastructure, a number of tools have been developed to automate the test execution
such as JUnit(1) and GoboTest(4). However, the automation of test data generation is still
a topic under research. Recently, a number of methods such as metaheuristic search, random
test generation and static analysis have been used to completely automate the testing process,
but the application of these tools to real software is still limited. Random test case generation
has been used by a number of tools (Jartege(34), Autotest(33), Dart(32)) that automate the
generation of test cases, but a number of studies found a genetic algorithm (evolutionary
testing) to be more efficient and to outperform random testing(9; 13; 16; 18; 26) for code
coverage.
1.2 Past research
The study of genetic algorithms as a technique for automating the process of test case gener-
ation is often referred to as evolutionary testing in the literature. Since the early 90s, there
has been a number of studies on evolutionary testing. The complexity and applicability of
1
1.2 Past research
these studies vary. In order to classify the relevance of past research to this project, a number
of studies are classified according to the complexity of the test cases being generated and the
optimization parameter used by the genetic algorithm. The complexity of the test cases be-
ing generated is important because to generate test cases for structured programs that only
take simple input, such as numbers is simpler than generating test cases for object-oriented
programs.
Reference Year Language type Optimization parameter
(5)Xanthakis, S. 1992 Procedural (C ) Branch coverage(6)Shultz, A.. 1993 Procedural (Vehicle Simulator) Functional(7)Hunt, J. 1995 Procedural (POP11[X] ) Functional (Seeded errors)(8)Roper, M.. 1995 Procedural (C) Branch coverage(9)Watkins, A. 1995 Procedural (TRITYP simulator) Path Coverage(10)Alander, J.. 1996 Procedural (Strings) Time(18)Harman M. 1996 Procedural (Integers) Branch coverage(14)Jones, B. 1998 Procedural (Integers) Branch coverage(11)Tracey, N.. 1998 Complex (ADA) Functional (specification)(12)Borgelt, K. 1998 Procedural (TRITYP simulator) Path Coverage(13)Pargas, R.. 1999 Procedural (TRITYP simulator) Branch coverage(15)Lin. 2001 Procedural (TRITYP simulator) Path Coverage(16)Michael, C. 2001 Procedural (GADGET) Branch coverage(17)Wegener, J. 2001 Procedural Branch coverage(19)Daz E 2003 Procedural Branch coverage(20)Berndt, D. 2003 Procedural (TRITYP simulator) Functional(9)A. Watkins 2004 Procedural Functional (Seeded error)(24)Tonella, P 2004 Object-oriented (Java) Branch coverage(21)D. J. Berndt 2005 Procedural (Robot simulator) Functional (Seeded error)(22)Alba .E 2005 Procedural (C) Condition coverage(23)McMinn P. 2005 Procedural (C) Branch coverage(27)Wappler, S. 2005 Object-oriented (Java) Branch, condition coverage(28)Wappler, S. 2006 Object-oriented (Java) Exceptions / Branch coverage(26)Harman, M. 2007 Procedural Branch coverage(25)Mairhofer, S. 2008 Object-oriented (Ruby) Branch coverage
Table 1.1: Previous work.
As shown in Table 1.1, there have been only a few projects that generate test cases for
object-oriented programs, and to the best of our knowledge there was only one project(11)
that generates test cases for object-oriented programs and uses the number of faults found as
2
1.3 Project goals
the optimization parameter for the genetic algorithm. In that study, test cases were generated
for ADA programs, but a formal specification had to be manually specified in a SPARK-Ada
proof context. Thus, the testing process was not completely automated.
Table 1.1 also shows that branch coverage was the optimization parameter used to drive
the evolution of test cases in most other studies. However, there is little evidence on the
correlation between branch coverage and the number of uncovered faults. Although code
coverage is a usefull test suit measurement, the number of faults a test suit unveils is more
important. Past research has shown that evolutionary testing is a good approach to automate
the generation of test cases for structured programs. To make this approach attractive to
industry, however, the system must be able to generate test cases for object-oriented programs
and to use the number of faults found as the main optimization parameter. To the best of
our knowledge, there is currently no existing project that fulfils these two requirements.
1.3 Project goals
This project has three goals:
1. to use genetic algorithms to automatically generate test cases for object-oriented pro-
grams written in Eiffel and to use the number of faults found as the optimization
parameter for the genetic algorithm.
2. to investigate the effect of different genetic algorithms on the number of faults found
when generating test cases for object-oriented software.
3. to combine evolutionary testing with static analysis and evaluate if this improves the
results.
The base hypothesis for this work is that evolutionary testing finds more faults and in less
time than random testing. This project innovates by using the number of faults as the main
optimization parameter for the genetic algorithm and combining static analysis to a genetic
algorithm. It also extends the existing research in evolutionary testing by providing a study on
the effect of different genetic algorithm techniques, such as mutation and crossover algorithms
on the evolution of test cases for object-oriented software.
This project is based on the Autotest(2) tool and the Design by Contracttm methodology
implemented by the Eiffel programming language(29).
3
2
Background
2.1 Testing
Testing is one of the most used software quality assessment methods. There are two important
processes when testing object-oriented software. First, the software has to be initialized with
a set of values. These values are used to set a number of variables that are relevant for the
test case. The values of these variables define a single state from the possible set of states.
These values can either be a primitive value such as an integer or complex values such as an
object. With the software initialized, its methods can then be tested by calling them. If a
method takes one or more objects as parameters, these objects also have to be initialized.
To determined if the test case passed or fail, a software specification has to be used. The
software specification defines what should be the output of the software and what is a valid
input. Because of the number of possible states a software may have is exponential, it is
impossible to test all of them. Interesting states are normally identified by the developers
according to a software specification or program structure . There are many types of testing.
However, they can all be classified as either black box or white box testing.
2.1.1 Black box
The Black box testing, also called functional testing(30), will consider the unit under test as a
black box where data is fed-in and the output is verified according to a software specification.
Functional testing has the advantage that it is uncoupled from the source code, because
given the software specification, test data can be generated even before the function has
been implemented. Functional testing is also closely related to the user requirements since
4
2.2 Eiffel & Design by Contract
it is testing a function of the program. Its main disadvantage is that it requires a software
specification and it may not explore the unit under test well since it does not know the code
structure.
2.1.2 White box
The white box testing technique, also called structural testing, will take into account the
internal structure of the code. By analyzing the structure of the code, different test data can
be generated to explore those specific areas. Structural testing may also be used to measure
how much of the code has been covered according to some structural criteria. By analyzing
the program flow and the path an execution took, a code coverage can be computed given
certain criteria such as statement coverage, which computes the number of unique statements
executed.
2.1.3 Automated testing
To automate the testing process, both the generation of test data and the execution of
test cases have to be automated. There are already a number of tools such as JUnit(1)
and GoboTest(4) that automate the test case execution but the main problem lies on the
automation of the test data generation. Since the number of possible input date is huge,
the problem can be viewed as an optimization problem, where the optimal solution is a set
of test data that triggers all fault in the software. There are some tools that will randomly
generate test data such as Autotest (33), DART (32) and Jartege(34), but there are many
optimization algorithms that are considered better then random.
2.2 Eiffel & Design by Contract
The lack of software specification is one of the main problems when automatically generating
test cases. Without specification it is impossible to be sure that a feature1 has failed. Even
when the test case leads the program to crash or throw an exception, it is not clear if the
software has a fault since the program could have not been defined for the given input.
Normally, the developers will write a header as a comment for each method, describing its1Feature means either a procedure or a function. In this report feature and method are interchangeably
used to refer to a procedure or a function.
5
2.2 Eiffel & Design by Contract
behaviour. Although there are guidelines on how to write these headers, they are not formal
enough to allow the derivation of the method’s pre- and postcondition.
This problem has been dealt by the Eiffel programming language(29), which, besides
other methodologies, implements the Design by Contracttm(31) concept. The idea behind
the Design by Contracttm is that each method call is a contract between the caller (client)
and the method (supplier). This contract is specified in terms of what the client must
provide and what the suppliers guarantees in return. This contract is normally written in
the form of pre- and postcondition boolean expressions for each method. In the example
illustrated in Figure 2.1, the precondition is composed by four boolean expressions and the
postcondition by two boolean expressions. These expressions are evaluated sequentially upon
method invocation and termination. The system will throw an exception as soon as one
of the precondition or postcondition boolean expression is evaluated to false. Therefore,
the method caller must ensure the precondition is true before calling the method call and
the method must ensure that the postcondition is true before returning. For example, the
borrow book method shown in Figure 2.1 takes the id of a borrower and the id of the book
this borrower wants to borrow. The method caller must ensure that the book id is a valid id,
it has at least one copy available, the borrower id is a valid id and the borrower can borrow
books. If these conditions are fulfilled, the method guarantees that it will add the book to the
borrower’s list of borrowed book and decrease the number of copies available by one. Apart
from the pre- and postcondition, every class has an invariant condition that has to remain
true after the execution of the constructor and loops may have variants and invariants. With
Design by Contracttm a method has a fault if it:
Figure 2.1: Example of Design by Contracttm -
6
2.3 Autotest
1. violates another method’s precondition.
2. does not fulfil its own postcondition.
3. violates the class invariant.
4. violates loop variant or invariant.
For the automation of test case generation, Design by Contracttm can be used to de-
termine if the generated test data is defined for a given method by checking it against the
precondition. It can also be used to check if a method has failed or not by comparing the
result against the postcondition. In the next section we discuss how this idea is implemented
in the Autotest tool (2).
2.3 Autotest
Autotest exploits the Design by Contracttm methodology implemented in Eiffel to automat-
ically generate random test data for Eiffel classes. Autotest works with a given timeout and
a set of classes to be tested.
Figure 2.2: Autotest algorithm 1 - Method invocation
Autotest starts by loading the classes to be tested and creating a table containing all (in-
cluding the inherited) methods of those classes. As described in the algorithm 2.2, Autotest
will randomly select methods to test while the timeout has not expired. Autotest chooses
the method to be tested (line 4) and the creation method (line 23) randomly. Autotest
7
2.3 Autotest
uses a probability to determine if a new object should be created or selected from the object
pool (line 11). The object pool is a set of all objects created by Autotest. The idea behind
the object pool is that reusing the objects that might have been modified during a previous
method call will increase the chance of finding more faults. When creating an object, Au-
totest uses different algorithms for extended and non extended types. Extended types are
the primitive types such as Integer, Boolean, Character and so on. For these types, Autotest
must provide an initial value as shown in Figure 2.3. The initial values for the extended types
are randomly selected from a set of fixed values chosen by the developers. These values are
listed in appendix A.1.
Figure 2.3: Autotest algorithm 2 - Object creation
When instantiating objects that are not of the extended type, Autotest will randomly
select one of its creation procedures and invoke it. After the timeout expires, Autotest will
generate a report containing the number of test cases generated, the number of failures, the
number of unique failures, the number of invalid test cases and will reproduce the code that
triggers the faults it found.
2.3.1 Faults
Eiffel will throw an exception whenever a contract is violated (precondidion, postcondition,
class invariant, loop invariant, loop variant). Autotest will then examine the exception to
find out if it was triggered by an invalid test case or by an actual fault in the code. Invalid
test cases are the test cases that violate the precondition of the feature being tested. If it
8
2.4 Genetic Algorithm
is a valid test case, Autotest will check if this fault is unique by looking at the line of code
where the exception happened and compare to all unique faults it has already found. Beside
the faults triggered by the Design by Contracttm conditions, other exceptions triggered by
calling methods on void object, lack of memory are also considered as valid test cases.
2.4 Genetic Algorithm
Genetic Algorithms (GA) are search algorithms based on the natural selection as described
by Charles Darwin. They are used to find solutions to optimization and search problems.
Genetic algorithms became popular when John Holland published the “Adaptation in Natu-
ral and Artificial Systems”(36) in 1975 and De Jong finished an analysis of the behaviour of
a class of genetic adaptive systems(35) in the same year. The basic idea of a GA is to encode
the values of the parameters of an optimization problem in a chromosome which is evaluated
by an objective function. As shown in Figure 2.4, the algorithm starts by initializing or
randomly generating a set of chromosomes (population). At the end of each generation, each
chromosome is evaluated and modified according to a number of genetic operations in order
to produce a new population. This process repeats until a predefined number of generations
is computed or until the objective value of the population has converged.
2.4.1 Chromosome
Each individual in the population is represented by a chromosome that stores the values of
the optimization problem. The chromosome is normally encoded as a list of bits, but its
encoding and structure can vary. Each gene of the chromosome can have a specific allele.
An allele specifies the range or the possible values that the gene can have. To evaluate each
chromosome, an objective function must be defined. The objective function uses the values
encoded on the chromosome to check how well it performs in the optimization problem. At
the end of each generation a number of genetic operations such as mutation and crossover
are applied to each chromosome to produce the population for the next generation.
2.4.2 Mutation
When a chromosome is passed on, it has a probability that some of its genes will not be
copied correctly and undergo a small mutation. Mutation ensures that the solutions of the
9
2.4 Genetic Algorithm
Figure 2.4: Genetic Algorithm flow diagram -
new generation are not identical to those of the previous one. The mutation probability
controls how much of the chromosome will mutate. A small probability leads to a slower
convergence, while a large probability will lead to instability. The mutation operator can be
defined in different ways. Three basic mutation operation are described below.
1. Flip mutator will change a single gene of the chromosome to a random value according
to the range specified by the alleles.
2. Swap mutator will randomly swap a number of genes of the chromosome.
3. Gaussian mutator will pick a new value around the current value using a gaussian
distribution.
The mutation operation is defined according to the structure of the chromosome. When
the chromosome is stored in a tree, one possible mutation is to swap subtrees as shown in
Figure 2.5.
2.4.3 Crossover
Crossover is the process where two or more chromosomes are combined to form one or more
chromosomes. The idea behind crossover is that the offspring may be better than both
10
2.4 Genetic Algorithm
Figure 2.5: Examples of mutation algorithms -
parents. Crossover is normally done between two individuals, but more can be used. There
are many crossover algorithms, some of them are described below:
1. Uniform crossover will randomly select the parent where each gene should come from.
2. Even odd crossover will select the genes with even index from parent A and the genes
with odd index from parent B.
3. One point crossover will randomly select a position on the chromosome and all the
genes to the left come from parent A and the genes to the right come from parent B.
4. Two points crossover will randomly select two positions and pick the genes from parent
A which have a greater index than the smaller position and a smaller index than the
biggest position. The remaining genes come from parent B.
5. Partial match crossover will produce two children C1 and C2. It initializes C1 by
copying the chromosome of the parents A and C2 by copying the chromosome of parent
B. It will then randomly select a number of positions and swap the genes between C1
and C2 at those positions.
6. Order crossover produces two children C1 and C2. It initializes by copying the
genes of the parents to the children and deleting n genes randomly selected from each
11
2.4 Genetic Algorithm
Figure 2.6: One and two points crossover -
offspring. It then selects an interval with size n and slides the genes such that the
interval is empty. It then select the original genes in that interval from the opposite
offspring. The algorithm is illustrated in Figure 2.7.
Figure 2.7: Order crossover examples -
7. Cycle crossover produces two children C1 and C2. It initializes C1 and C2 by copy-
ing the chromosomes of the parents A and B respectively. Then it selects n random
positions and replaces the genes from C1 with genes from parent B in those positions.
The process is repeated for C2 with parent A.
2.4.4 Objective and fitness value
The objective value is the performance measurement for each chromosome. This value is
used to aid the selection of chromosomes for crossover. It can be used directly to select the
good chromosomes for crossover, but it is normally scaled to produce a fitness value. The
scaling function is one method that can be used to minimize the elitism problem described in
12
2.4 Genetic Algorithm
section 2.4.5, where only a limited number of chromosomes is involved in producing the next
generation. This fitness value is then used to compute compatibility of each chromosome
for crossover. The compatibility is used to ensure that good individuals are not combined
with bad ones. Many methods exist to compute the fitness value; the most common scaling
methods are described below.
1. Linear scaling
fitness = α ∗ objectiveV alue+ β (2.1)
2. Power law scaling
fitness = objectiveV alueα (2.2)
3. Sharing scaling computes the number of genes that the two chromosomes have in
common. Two individual are considered unfit for mating when their difference is very
low, meaning that they are too similar. The difference can be computed using bitwise
operations (37) or other user-specified method if the chromosome is not encoded as bit
strings.
2.4.5 Selecting individuals for reproduction
Elitism and diversity are two important factors when selecting individuals for reproduction.
With elitism, selection is biased towards the individuals with the best objective value. Elitism
is important since it removes bad solutions from the population and reproduces the good
ones. However, by continuously reproducing from a small set of individuals, the population
becomes very similar which may lead to a sub-optimal solution. This effect The diversity of
the population ought to be controlled to ensure that the search space is explored well. Many
selection schemas have been developed to properly select the individuals for reproduction and
to try to minimize the elitism problem. Some of the selection schemas include:
1. Rank schema selects the best individuals of the population every time.
2. Roulette Wheel selects individuals according to their fitness values as compared to
the population. The probability of an individual being picked is:.
p1 =fitness∑len(population)
i=0 fitnessi(2.3)
13
2.4 Genetic Algorithm
3. Tournament sampling uses the roulette wheel method to select two individuals. Then
it picks the one with the higher fitness value.
4. Uniform sampling selects an individual randomly from the population.
5. Stochastic remainder sampling first computes the probability of each individual be-
ing selected, p1, and its expected representation, ε = p1∗len(population). The expected
representation is used to create a new population of the same size. For example, if an
individual has ε equal to 1.7, it will fill one position in the new population and it has
a probability of 0.7 to fill another position. After the new population is created, the
uniform method is used to select the individuals for mating.
6. Deterministic sampling computes ε of each individual as in the stochastic remainder
sampling. A new population is created and filled with all individuals with ε > 1 and
the remaining positions are filled by sorting the original population’s fractional parts
of ε and selecting the highest individuals on the list.
2.4.6 GA Variations
There are three common types of the Genetic Algorithm. They differ in how the new popu-
lation is computed at the end of each generation.
1. Simple Genetic Algorithm uses a non-overlapping population between generations.
At each generation the population is completely replaced.
2. Steady-state Genetic Algorithm uses an overlapping population where a percentage
of the population is replaced by new individuals.
3. Incremental Genetic Algorithm has only one or two children replacing members of
the current population at the end of each generation.
Compared to other optimization algorithms, genetic algorithm is relative simple and
robust(37). In the past, it has been successfully used to automatically generate test data
to optimize the code coverage as described in section 1.2. In this work, genetic algorithms
are used to automatically generate a set of test cases and optimize the number of faults found.
One of the main reasons we believe genetic algorithm is a good approach for automatically
generating test data is because it can adapt to the code being tested. It is plausible to assume
14
2.4 Genetic Algorithm
that developers will acquire bad habits with time which leads to a patter of mistakes. One
assumption is that genetic algorithms may be able detect some of these mistakes and tune
the test data generation mechanism to exploit it.
15
3
Evolutionary testing
3.1 Implementation
To link the genetic algorithm to Autotest an evolutionary testing strategy is implemented for
Autotest. This strategy will generate and execute test cases according to parameters specified
in a chromosome generated by a genetic algorithm. To find a good strategy (chromosome), a
genetic algorithm is implemented in C++ using the GAlib(38) library. The communication
between Autotest and the genetic algorithm is done through two files. The four basic com-
ponents of the system are shown in Figure 3.1.
Figure 3.1: Four basic components of the system -
When Evolutionary Autotest is executed, it will load the chromosome from file containing
parameter settings for the Autotest test generator and test the classes for a given amount
of time. In the end, it produces a report containing the objective value (number of unique
faults found) which is used by the genetic algorithm to evaluate how good that chromosome
16
3.1 Implementation
is. The evolution of a testing strategy (chromosome) can be done for a single class or a set.
On this work, however, the evolution of a testing strategy is performed for single classes.
3.1.1 Parameters
Genetic algorithms work by optimizing parameters for a given problem. In order to optimize
the generation of test cases, five different parameters have been used. These parameters
influence how the test cases are generated and how Autotest is executed.
1. primitive values: these specify a set of values for each of the five primitive types
(Integer, Real, Characters , Boolean and Natural). These values are used for creating
objects that are used as input data.
2. method call: specifies which methods should be called and which parameters should
be used for each method call. This parameter is used to set the software into different
states while it is being tested.
3. creation probability: probability of creating a new object instead of reusing one from
the object pool.
4. seed: value used to initialize the pseudorandom number generator.
5. sequential method call: calls the methods of the class under test sequentially and
selects input parameters for each method at random.
As described in section 2.3, Autotest has a fixed set of primitive values and it will call
and create objects randomly. Although a study (2) has shown that the creation probability
parameter can be optimized, it is not obvious which parameters are good for evolutionary
testing. When there is not enough time to optimize the paremeter, it might be better to
use a random strategy or predefied value. The goal is to select the best set of parameters
for each class but because there are 25 possible sets of parameters, it is not feasible to test
all of them for every class. The evolutionary algorithm will optimize all parameters and a
file is used to specify which parameters Autotest should use while executing the evolutionary
strategy. Because finding the best set of parameters for each class is an optimization problem,
the genetic algorithm can also be used to optimize the set of parameters. The chromosome
has been used to specify the values for these parameters, but the chromosome can also be
17
3.1 Implementation
used to specify which parameters should be used. Thus, the genetic algorithm can be used
to optimize both, the set of parameters used and their values.
3.1.2 Algorithm stages
The chromosome is encoded as a list of floating numbers because all the parameters can
be represented as a floating number without much conversion. The implementation of the
genetic algorithm is divided into four stages.
Figure 3.2: Four stages of the genetic algorithm -
1. Specification: Create the chromosome and specify the alleles.
2. Initialization: Create the initial population.
3. Evaluation: Evaluate the population.
4. Mutation and Crossover: Apply mutation and reproduction operations.
3.1.2.1 Allele value specification
As described in section 2.4.1, the alleles can be used to specify the range or a list of possible
values allowed for each gene. Specifying the allele for each gene simplifies the chromosome
encoding and interpretation. For example, the range of valid values for the Character data
type is between 0 and 600, but by randomly selecting a floating number, it is likely that a
number outside this range will be selected since the set of floating numbers is much larger
than the set of Character. This would force the number to be rounded down to 600 or up to
18
3.1 Implementation
0, and lead to a set of characters with similar values. By specifying the allele (0, 600), all the
characters will have the same probability of being picked. The chromosome is created given
the number η of values that is encoded for each primitive type. The starting, finishing index
and the allele specification for each parameter is shown in Appendix B.
3.1.2.2 Initialization
The seed, method call, creation probability are initialized with random values from the range
of values specified by the alleles. The primitive values may be initialized in three different
ways:
1. Randomly: select random values from the range of values specified by the alleles of each
gene.
2. Hard coded: use the original values used by Autotest as specified in Table A.1 and
complete the set of values with random values.
3. Static analysis: in this approach, a simple technique is used to extract primitive values
from the classes under test. The system works by scanning the classes for natural,
integer, real and character values and storing these values. Because the system does
not consider the structure of the code, it will even use values found on comments. These
values are then used to initialize the chromosome combined to random values. When
initializing the chromosome, a probability (0.8) is used to specify whether each value
should come from the set of value obtained using static analysis or from a random value
generator (0.2). This probability is used to avoid initializing a population that is too
similar, by introducing some random values, a level of diversification is guaranteed.
3.1.2.3 Evaluation
When evaluating a chromosome, the genetic algorithm will generate a set of files (shown in
Appendix C) that contain the values of the parameters encoded in that chromosome for a
specific class. Autotest is then executed to test a class for a fixed amount of time and the
number of unique faults found is used as the object value.
Since each chromosome can be executed independently from the others, the evaluation
of the population is executed in parallel. The parallel evaluation of the population works by
creating 4 instances of the code under test and calling Autotest for each one of them. As
19
3.1 Implementation
Figure 3.3: Parallel population evaluation -
illustrated in Figure 3.3, four individuals are evaluated in parallel. This number was chosen
because there were four processors in the computer used for experiments. For an optimal
evaluation, the population size ought to be a multiple of four.
3.1.2.4 Mutation and crossover
To test a piece of software thoroughly, it is important to test it in many different states. A
state can be reached by a particular sequence of method calls. Autotest hopes to achieve
different states by randomly invoking methods. To map this behaviour onto the chromo-
some, the possibility of adding and removing a method call has to be considered because
some states can be reached in two, while others may require seven method calls. Another
problem is that each method call has a certain number of parameters of a specific type. With
these requirements, the crossover operation may produce a corrupted chromosome, since the
number of method calls and parameters for each method call may differ for each chromosome.
Figure 3.4 shows an example where the chromosome stores the method to be called with
the parameters in the same section of the chromosome. Chromosome X will call method a,
method b and method a again. The problem is that method a takes two String parameters.
The combined chromosome, however, will produce a call to method a that takes one String
and one Integer. One possible solution to this problem was described by Tonella(24). Tonella
used grammar to specify syntax. This grammar was then used to drive the mutation and
20
3.1 Implementation
Figure 3.4: Corrupted chromosome caused by crossover -
crossover operations.
In this project, a simpler approach was used to solve the same problem. First, the section
on the chromosome that specifies which methods should be invoked is separated from the
section that specifies which parameters should be used. When a method needs three param-
eters, it reads three slots from the parameter section of the chromosome. If the next method
requires two parameters it will read the next two slots. To ensure that the parameters are of
the right type, the chromosome does not specify the object to be used but instead specifies
an index of the object as shown in Figure 3.5. Since Autotest knows which types are needed
to execute each method, the chromosome just needs to specify which object from the list of
possible objects has to be used. Because the number of methods and the number of available
types is not know in advance, the chromosome assumes a maximum number and the real in-
dex is computed with real index = chromosome index MOD list size. Where the list size
is the list of methods to call or a list of available object of a given type.
With this approach, adding or removing a method call is very simple. Whenever a mu-
tation makes the real index = 0, the method call is removed and when the real index is
modified from 0 to a different number, a method call is added. With this approach different
21
3.2 Evolutionary Autotest
Figure 3.5: Valid chromosome crossover -
mutations and crossover methods can be used without having to worry about the chromosome
getting corrupted.
3.2 Evolutionary Autotest
The evolutionary strategy is executed by specifying the −E option when running Autotest.
It starts by loading a file that specifies which parameters it should use and the chromosome
files which store the values for these parameters. Then it checks if the creation probability
parameter is being used. If so, it sets the probability value. With the Evolutionary Autotest,
there are two new ways to select the methods to be tested with the evolutionary strategy. If
the method call parameter is true, it will compute the real index and select a method from
the method table with that index. If the method call is false and the sequential method is
true, it will select the next method from the table of methods in a sequential manner. When
both the method call and sequential method are true, the method call is used. A random
method is selected if both parameters are false. After Autotest invokes a method, it checks
if the seed parameter is being used, if so, it selects a new seed from the list of seeds as shown
in Figure 3.6.
22
3.2 Evolutionary Autotest
Figure 3.6: Evolutionary Autotest 1 - loading chromosome and evolve.conf -
The invoke method will use the creation probability to decide if new objects should be
created instead of reusing the objects from the object pool. The pseudocode of this method
is shown in Figure 3.7
Figure 3.7: Evolutionary Autotest 2 - method call -
When creating an object, Autotest will check if the method call parameter is being used.
If so, the constructor method is selected according to the real index computed using a value
from the list of method calls encoded in the chromosome. If the method call is not being
used, Autotest will select a random constructor to instantiate the object. If it is creating a
primitive type, it will check whether the primitive value parameter is true. If so, it will get
23
3.2 Evolutionary Autotest
a value from the list of primitive values loaded for each primitive type. Figure 3.8 shows the
pseudocode for the create new input object method.
Figure 3.8: Evolutionary Autotest 3 - object creation -
When all the parameters are false, the evolutionary strategy becomes a random strategy.
24
4
Experiments
4.1 Introduction
The experiments were divided into three groups, each group is concerned with a specific
optimization of the system. The experiments from Group A were executed to find what is
the best way to encode the chromosome by examining the effect of each parameter on the
number of faults found. The experiments from Group B were executed to optimize the
genetic algorithm by evaluating different genetic operators and probabilities. The last set
of experiments from Group C were executed to evaluate the effectiveness of evolutionary
testing compared to the random testing.
4.2 Setting
Twenty two classes were randomly selected from two well-used libraries, time and lex, pro-
vided with EiffelStudio 6.3. The lex library provides a mechanism for building lexical analyz-
ers from regular expressions and the time library provides an abstraction for data and time
computation. The selected classes were divided into two sets α and β. The α set listed in
Table 4.1 was used for optimizing and validating the system and the β set listed in Table 4.2
was only used for validating the system. The tables 4.1 and 4.2 list the number of lines of
code (LOC), the number of local features, the number of feature including the inherited ones
for each class.
All experiments were executed on a single machine with Intel Coretm2 Quad Q6600 with
2Gb of RAM running Linux.
25
4.2 Setting
Library Class Name LOC Local features Total features
Lex automaton 70 7 38
dfa 103 5 42
error list 96 6 121
fixed automaton 77 4 99
fixed dfa 170 7 112
fixed integer set 207 10 58
high builder 820 39 410
Sum 1543 78 880
Time absolute 81 5 81
time 408 29 408
interval 365 30 365
duration 70 5 70
date time parser 379 32 379
date time validity checker 81 3 87
Sum 1384 104 1390
Total 2927 182 2270
Table 4.1: Properties of the classes from the α set
In each experiment, a number of parameters used by the genetic algorithm has to be spec-
ified. These parameters were set according to each experiment with the goal of emphasizing
the part of the algorithm being tested. For example, when evaluating the parameters for
Autotest, it is important to have a bigger population size to increase the number of execu-
tions of Autotest. On the other hand, when evaluating crossover or mutation algorithms, it
is important to increase the number of generations, since these operations are only performed
at the end of each generation. In total six different settings were used. These settings are
specified in Table 4.3.
Library Class Name LOC Local features Total features
Lex text filler 488 28 59
lex builder 1288 55 365
lexical 608 42 99
linked automaton 52 1 116
metalex 201 11 420
ndfa 307 19 56
pdfa 440 22 311
scanning 145 6 426
state of dfa 74 3 101
Sum 3603 187 1953
Table 4.2: Properties of the classes from the β set
26
4.3 Experiments Group A
4.3 Experiments Group A
In section 3.1.1, a number of parameters that specify how test cases should be generated were
identified, but not all parameters might have a positive effect on the evolutionary algorithm.
Some of these parameters might be very sensitive to modification or take very long to be
optimized and this may lead to an overall poorer solution. The goal of the experiments in
group A is to find the best set of parameters that should be encoded in the chromosome.
Configuration Setting 1 Setting 2 Setting 3 Setting 4 Setting 5
population size 30 20 8 4 8
number of generations 4 6 10 10 10
mutation probability 0.6 α 0.6 0.4 0.4
crossover probability 0.8 0.8 0.6 0.4 0.4
replacement percentage 0.4 0.6 0.5 0.5 0.6
Table 4.3: Experiment settings
4.3.1 Experiment A1: Autotest parameters
As described in section 3.1.1, a total of 5 parameters that specify how test cases should
be generated were identified. A total of 12 experiments were executed to find out how
these parameters contribute to the number of unique faults found. These experiments were
executed with genetic algorithm setting 1 specified in Table 4.3. The number of faults found
for each class for different set of parameters are shown in Table 4.4.
The results show that the performance of the parameters is dependent on the class being
tested. According to the results, there is no dominating parameter, as each parameter per-
formed best for at least one class. For instance, the creation probability parameter which had
the worst performance overall, performed the best for the error list class. The method call
parameter performed the best for both dfa and data time validity checker classes. Since
there are 32 possible combinations of parameters, it is unfeasible to test all of them every-
time a class is tested. Thus the technique that optimizes the set of parameters used described
in section 3.1.1 was developed. This experiment was therefore the only experiment that did
not use this technique. This experiment also compared two methods for initializing the prim-
itive values. The Primitives column of Table 4.4 shows the number of faults found when
randomly initializing the primitive values and the column Static analysis shows the number
27
4.4 Experiments Group B
Library Class name Primitives Primitives,
Creation
probability
Primitives,
Sequential
method
call
Primitives,
Seed
Primitives,
Method
call
Static
analy-
sis
Lex automaton 9 8 7 9 8 9
dfa 16 11 16 16 17 16
error list 3 7 3 6 6 6
fixed automaton 11 8 10 9 8 15
fixed dfa 21 13 19 21 17 19
fixed integer set 3 3 3 3 3 3
high builder 13 8 14 13 5 14
Sum 76 58 72 77 64 82
Time absolute 0 1 3 1 3 0
time 4 5 5 6 3 6
interval 3 3 3 3 2 4
duration 0 0 0 0 0 0
date time parser 1 1 0 1 1 1
date time validity
checker
2 2 3 2 4 3
Sum 10 12 14 13 13 14
Total 86 70 86 90 77 96
Table 4.4: The effect of Autotest parameters on the number of unique faults found
of faults found when initializing the primitive values by combining values extracted from the
classes being tested to random values as described in section 3.1.2.2. The number of faults
found using the static analysis technique considerably increased compared to random. Fig-
ure 4.1 shows a comparison of the two approaches. Another interesting result was the poor
performance of the creation probability parameter. The optimization of the primitive and
the creation probability parameter decreased the number of faults found compared to the
optimization of the primitive parameter alone. This negative effect was due to the range of
values (0 to 1) used for this probability. According to (39), Autotest performs bad when the
creation probability is far from the value of 0.25. To improve the performance of the creation
probability parameter, the range of possible values was be decreased to (0.2 to 0.3).
4.4 Experiments Group B
As described in section 2.4, there are different mutation, crossover and population selecting
algorithms. In order to evaluate these genetic operators a total of 65 experiments were
executed.
28
4.4 Experiments Group B
Figure 4.1: Number of faults found using random and static analysis technique toselect the initial primitive values -
4.4.1 Experiment B1: Mutation probability
Mutation probability controls how often the mutation operator is applied to each gene. When
the probability is too low, the genetic algorithm takes longer to converge and when the prob-
ability is too high, the algorithm becomes unstable. To find the best value, 10 experiments
were executed with five different mutation probabilities. The flip mutation algorithm and the
genetic algorithm setting 2 was used in these experiments.
As shown in Table 4.5, the mutation probability does not seems to have a big impact on
the overall performance as long as the probability is not too low. In this case the mutation
probability of 0.4 was just slightly better than 0.8 by finding two more unique faults.
4.4.2 Experiment B2: Mutation algorithm
The mutation algorithm has a direct impact on how the search space is explored. The
three mutation algorithms described in section 2.4.2 were evaluated. To find which mutation
algorithm performed best, a total of 6 experiments were executed with setting 5. The number
of unique faults found for each class is shown in Table 4.6.
The results show that the flip mutation algorithm outperformed the swap mutation algo-
rithm by 36% and the gaussian by 32%. The flip mutator performed the best for all classes
except error list as illustrated in Figure 4.2. One possible reason for the poor performance
29
4.4 Experiments Group B
Library Class Name 0.2 0.4 0.6 0.8 1
Lex automaton 8 9 9 8 9
dfa 16 15 16 15 15
error list 4 6 5 5 5
fixed automaton 9 9 9 9 9
fixed dfa 17 18 18 18 18
fixed integer set 3 3 3 3 3
Sum 57 60 60 58 59
Time high builder 13 15 13 14 14
absolute 3 3 3 3 4
T ime 6 6 5 6 6
interval 3 3 3 3 3
duration 0 0 0 0 0
date time parser 1 1 1 2 1
date time validity checker 4 3 4 4 3
Sum 30 32 29 32 31
Total 87 92 89 90 90
Table 4.5: The effect of the mutation probability on the number of unique faultsfound
Library Class Name Flip Swap Gaussian
Lex automaton 9 8 9
dfa 17 16 15
error list 5 4 6
fixed automaton 20 9 9
fixed dfa 24 19 18
fixed integer set 3 3 3
high builder 16 14 15
Sum 94 73 75
Time absolute 5 2 3
T ime 6 5 5
interval 6 3 3
duration 0 0 0
date time parser 5 2 1
date time validity checker 4 3 4
Sum 26 25 16
Total 120 88 91
Table 4.6: The effect of mutation algorithms on the number of unique faults found
30
4.4 Experiments Group B
of the swap mutator is that it will never introduce new values in the chromosome, thus lim-
iting the search space to the current values. One possible reason for the poor performance of
the gaussian mutator compared to the flip mutator, is that the gaussian mutation algorithm
replaces the value of a gene by a close-by value, which leads to the exploration of states that
are close to the current state.
Figure 4.2: Number of faults for mutation algorithms for each class -
4.4.3 Experiment B3: Crossover probability
The crossover probability controls how much of the population will crossover. A low crossover
probability may lead to a very slow convergence whereas a high value may lead to a high
number of unfit individuals. To find a good crossover probability a total of 14 experiments
were executed using the uniform crossover algorithm. The genetic algorithm setting 5 was
used. The table 4.7 shows the number of unique faults found for each class.
31
4.4 Experiments Group B
Library Class Name 0 0.1 0.2 0.4 0.6 0.8 1
Lex automaton 9 8 9 9 10 7 7
dfa 7 15 15 17 15 14 15
error list 5 6 5 6 10 10 4
fixed automaton 16 10 19 24 12 10 15
fixed dfa 24 20 26 28 30 17 18
fixed integer set 3 3 3 3 3 3 3
high builder 16 17 16 17 18 14 16
Sum 86 79 93 104 98 75 78
Time absolute 5 4 4 5 5 5 5
time 9 6 7 7 7 6 6
interval 4 5 5 5 6 5 5
duration 0 0 0 0 0 0 0
date time parser 5 4 4 4 4 5 4
date time validity checker 4 5 5 5 4 5 5
Sum 27 24 25 26 26 26 25
Total 113 103 118 130 124 101 103
Table 4.7: The effect of crossover probability on the number of unique faults found
The results show that the best crossover probability is around 0.4. Compared to the
mutation probability, the crossover probability had a greater influence on the result as shown
in Illustration 4.3, the crossover probability forms a curve with peak on 0.4 whereas the
mutation probability looks like a straight line. This indicates that the crossover algorithm may
have a greater influence on the number of faults found compared to the mutation algorithm.
Figure 4.3: Effect of mutation and crossover probability on the number of faults -
32
4.4 Experiments Group B
4.4.4 Experiment B4: Crossover algorithm
Combined with the mutation algorithm, the crossover algorithm specifies how the search space
is explored. The crossover algorithm must be able to combine chromosomes in a way that
affects all the values encoded in the chromosome. Since the chromosome is encoded by sec-
tions, where each section represents the values of a single parameter, the crossover algorithm
must be able to mix well all sections of the chromosome. To find a good crossover algorithm,
all algorithms described in section 2.4.3 were evaluated. A total of fourteen experiments were
executed using setting 5. The results are shown in Table 4.8.
Library Class name Uniform Even
Odd
One
Point
Two
Points
Partial
Match
Order Cycle
Lex automaton 8 8 8 9 10 9 8
dfa 14 15 16 15 15 15 14
error list 6 7 5 7 10 5 5
fixed automaton 19 18 19 17 12 16 16
fixed dfa 27 25 27 21 30 26 23
fixed integer set 3 3 3 3 3 3 3
high builder 18 14 18 16 18 13 16
Sum 95 90 86 88 98 87 85
Time absolute 5 5 5 4 5 4 4
time 6 7 6 7 7 6 6
interval 5 5 5 5 6 4 6
duration 0 0 0 0 0 0 0
date time parser 4 4 5 5 4 4 5
date time validity
checker
5 5 5 4 4 6 4
Sum 25 26 26 25 26 23 25
Total 120 116 122 113 124 110 110
Table 4.8: The effect of crossover algorithms on the unique number of faults found
The results indicate that the crossover algorithms that modify more sections of the chro-
mosome had a better performance compared to the ones that only modify a few. By modifying
different parts of the chromosome, the algorithm has a higher chance of modifying the values
of all parameters encoded instead of just one. As shown in Figure 4.4, the uniform and
partial match performed much better then the order and cycle crossover. The discrepancy
between the results of the one and two point crossover algorithm is strange. The difference
on the number of unique faults found might be too high to be attributed to variation. More
experiments would be required to investigate why one point crossover performed much better
than two points.
33
4.4 Experiments Group B
Figure 4.4: Comparison of crossover algorithms -
4.4.5 Experiment B5: Selection method
The selection algorithm is very important to control the diversity of the population. As
described in section 2.4.5, if the same individuals are constantly selected for reproduction,
the population may become too similar and converge to a sub-optimal solution. To find a
good selection mechanism, all algorithms described in section 2.4.5 were evaluated in a total
of 12 experiments with setting 4. The results are shown in Table 4.9.
Figure 4.5: Comparison of selection algorithms -
34
4.5 Experiments Group C
Library Class Name Rank Roulette
Wheel
Tournament Uniform Stochastic
Reminder
Deterministic
Sampling
Lex automaton 7 9 9 9 9 12
dfa 15 14 16 14 15 14
error list 10 5 5 6 5 4
fixed automaton 14 14 12 9 17 15
fixed dfa 18 18 20 20 21 21
fixed integer set 3 3 3 3 3 3
high builder 17 16 14 12 15 14
Sum 84 79 79 73 85 83
Time absolute 4 4 4 4 5 4
T ime 8 5 6 6 7 8
interval 5 4 4 5 4 5
duration 0 0 0 0 0 0
date time parser 4 4 4 4 4 4
date time validity
checker
5 5 5 4 5 5
Sum 26 22 23 23 25 26
Total 110 101 102 96 110 109
Table 4.9: The effect of different population selection schema on the number ofunique faults found
Both the stochastic reminder and the rank method found a total of 110 unique faults.
The rank algorithm does not perform any speciation technique, it only selects the best indi-
viduals, whereas the stochastic reminder will normalize the population as described in 2.4.5.
Among the six selection schemas evaluated, three had a similar result, as shown in Figure
4.5. These results indicate that elitism has a good effect on the number of unique faults
found by the genetic algorithm, since the three best performing algorithms had a greater
probability to select the individual with the best score. In particular, the Rank algorithm
that will constantly select the individuals with the best score for reproduction. The crowding
issue does seems to for this type of genetic algorithm, one possible reason for that is the small
number of generations. Because the computation of the objective function takes a long time,
the number of generation has to be limited, not giving enough time for crowding to happen.
4.5 Experiments Group C
With the optimization of the chromosome and the genetic algorithm completed, the final sys-
tem is compared to an automated random testing system represented by Autotest. The main
35
4.5 Experiments Group C
criterion for evaluation is the number of unique faults found within a given amount of time.
To make the comparison fair, Autotest is executed for the same amount of time taken by the
evolutionary algorithm to evolve a testing strategy and test the system using this strategy.
Three executions of Autotest with different seeds were recorded for 15, 30 and 60 minutes and
the average results are used for comparison. In addition to the set of classes used in the pre-
vious experiments, a new set of classes as described in 4.2 were used for validation the results.
4.5.1 Experiment C1 :Original Autotest
To determine the maximum number of unique faults Autotest can find within a certain
amount of time, Autotest was executed three times for 15, 30 and 60 minutes. The average
number of faults found for each class from the α set are shown in Table 4.10 and for the β
set in Table 4.11.
Library Class Name 15 min 30 min 60 min
Lex automaton 9 9 9dfa 17 17 17error list 12 13 14fixed automaton 10 10 11fixed dfa 24 28 29fixed integer set 1 1 1high builder 19 24 26
Sum 93 103 107
Time absolute 1 1 1time 5 6 6interval 1 1 1duration 0 0 0date time parser 0 0 0date time validity checker 1 1 1
Sum 8 9 9
Total 101 112 116
Table 4.10: Number of unique faults found in the α set by the original Autotest
The variation from the three execution was not so large. Classes with a higher number
of faults had a greater variation compared to the ones with less faults. The variation on the
36
4.5 Experiments Group C
Library Class Name 15 min 30 min 60 min
Lex text filler 12 13 14lex builder 16 18 22lexical 24 26 26linked automaton 8 10 10metalex 22 29 33ndfa 8 9 9pdfa 11 12 13scanning 29 31 34state of dfa 11 18 13
Sum 140 160 174
Table 4.11: Number of unique faults found in the β set by the original Autotest
time library was very small, the total number of faults found in the classes from the time
library had a variation of only one fault from the average result. The Figure 4.6 shows the
variation on the total number of faults found in all classes for the three executions of Autotest.
Figure 4.6: Variation on the total number of faults found -
The total number of faults found in each execution and the average for 15, 30 and 60
minutes is shown in Table 4.12.
37
4.5 Experiments Group C
Execution 15 min 30 min 60 min
Average 241 272 290
Execution 1 236 268 282
Execution 2 244 276 295
Execution 3 243 272 293
Table 4.12: Total number of faults found in three executions of Autotest
It can can also be observed that most of the faults were found in the first few minutes
of executions and the number of new faults found rapidly decreased as shown by Figure 4.7.
This result goes in accordance with the results from the random testing predictability study
(39).
Figure 4.7: Autotest progress - Progress of number of faults found
4.5.2 Experiment C2: Autotest with static analysis
Using static analysis to select an initial set of primitive values is a technique that is indepen-
dent from the the evolutionary algorithm. Because this technique was used in the evolutionary
testing system, Autotest was extended to use the same static analysis technique for selecting
primitive values. As described in section 3.1.2.2, this approach extracts the primitive values
from the classes it is testing to use as input for the data generation. This enhanced Autotest
was then executed for 15, 30 and 60 minutes and the number of faults found in the α set is
shown in Table 4.13 and the number of faults found in the β set in Table 4.14.
38
4.5 Experiments Group C
Library Class Name 15 min 30 min 60 min
Lex automaton 6 6 6
dfa 16 16 16
error list 11 12 15
fixed automaton 14 22 31
fixed dfa 30 35 36
fixed integer set 2 2 2
high builder 19 16 28
Sum 98 109 134
Time absolute 2 2 2
T ime 5 6 8
interval 3 3 3
duration 0 0 0
date time parser 1 1 3
date time validity checker 3 3 3
Sum 14 15 19
Total 112 124 153
Table 4.13: Faults found in the α set by Autotest with static analysis
Library Class Name 15 min 30 min 60 min
Lex text filler 14 15 15
lex builder 15 17 27
lexical 27 30 30
linked automaton 8 8 8
metalex 27 30 33
ndfa 9 9 9
pdfa 11 15 19
scanning 24 30 40
state of dfa 17 20 25
Sum 152 174 206
Table 4.14: Faults found in the β set by Autotest with static analysis
Autotest with static analysis found more faults than the original Autotest for the α set
as shown in Figure 4.8 and the β set as shown in Figure 4.9. This was expected since using
primitive values extracted from the source code increases the probability that test cases
generated using primitive values are relevant. For example, if the source code has a boolean
expression that evaluates x == 782, the probability of randomly returning the value of 782
is very low. But using static-analysis this values is immediately added to a list of relevant
values.
39
4.5 Experiments Group C
4.5.3 Experiment C3: Evolutionary testing
With the crossover, mutation, selection algorithm and probability selected, the final genetic
algorithm can be composed. The final genetic algorithm uses the partial matching crossover
algorithm with a crossover probability of 0.4, the flip mutator with a mutation probability
of 0.4 and the stochastic remainder algorithm to select the population for crossover. With
the parameters chosen, the evolutionary algorithm is executed for 15 , 30 and 60 minutes
for both the α and the β set. This time includes both the evolution and the execution of a
strategy. The time allocation used for each execution is shown in Table 4.15. These values
were chosen based on a preliminary optimization (results not shown).
Total time Evolution Execution
15 min 5 10
30 min 10 20
60 min 15 45
Table 4.15: Time allocation for each execution of the final system
Between the three executions, the population size, number of generation and the size of
the chromosome varied. For short executions it is better to have a smaller chromosome. A
smaller chromosome will converge faster, but it may converge to a sub-optimal solution, so
for longer executions it is better to have a larger chromosome. This effect is mainly due to
the number of primitive values that are encoded in the chromosome. As described in section
3.1.2.1, this number defines the size of the chromosome. The settings used for the three
executions are shown in Table 4.16.
Setting 15 min 30 min 60 min
primitive set size 40 100 100
population size 4 8 8
number of generations 9 12 12
mutation probability 0.4 0.4 0.4
crossover probability 0.4 0.4 0.4
Table 4.16: Settings for the final execution
The evolutionary algorithm was first executed for the 13 classes from the α set. The
number of unique faults found for each class is reported in Table 4.17.
40
4.5 Experiments Group C
Library Class Name 15 min 30 min 60 min
Lex automaton 9 9 9
dfa 16 17 17
error list 13 13 15
fixed automaton 17 23 31
fixed dfa 32 39 41
fixed integer set 2 2 2
high builder 18 23 26
Sum 122 126 141
Time absolute 3 2 4
T ime 6 5 10
interval 3 5 5
duration 0 0 0
date time parser 0 5 5
date time validity checker 3 3 3
Sum 15 20 27
Total 122 146 168
Table 4.17: The number of unique faults found in the α set by the GA
Figure 4.8: Evolutionary approach on α set - Number of faults found in the α set
41
4.5 Experiments Group C
A comparison between the original Autotest, Autotest with static analysis and the evo-
lutionary algorithm illustrated in Figure 4.8, shows that the evolutionary algorithm outper-
formed both the original Autotest and Autotest with static analysis. Because the classes
from the α set were used for optimization, the system was tested against a new set of classes,
the β set. The number of faults found in the β set using the evolutionary algorithm is shown
in Table 4.18.
Library Class Name 15 min 30 min 60 min
Lex text filler 16 15 16
lex builder 18 24 34
lexical 27 28 32
linked automaton 7 8 11
metalex 25 31 43
ndfa 9 9 11
pdfa 12 14 23
scanning 27 34 41
state of dfa 16 24 29
Sum 157 187 240
Table 4.18: The number of unique faults found in the β set by The GA
Figure 4.9: Evolutionary testing on β set - Comparison of the three approaches using classesfrom the β set
42
4.5 Experiments Group C
The results for the β set also shows the evolutionary algorithm as the best performing
approach. However, the difference between the Autotest with static analysis and the evolu-
tionary algorithm is very small. One possible reason is that the evolutionary algorithm has a
big penalty when it is executed for short times because it won’t have enough time to evolve
a good strategy. This penalty is even higher when a class has a high number of faults, since
most of these faults are found within the first minutes. With longer executions, the time
used to evolve a good strategy starts to pay-off. While Autotest is slowing down the rate
it finds new faults, the evolutionary approach continues to find new faults in a faster pace.
This pattern is shown in Figure 4.10.
Figure 4.10: Total number of faults found for all classes over time by the threeapproaches -
The evolutionary algorithm performed considerably better when the classes being tested
had fewer bugs. This can be seen in Figure 4.11 which shows a comparison of the three
systems using only the classes from the time library.
43
4.5 Experiments Group C
Figure 4.11: Evolutionary approach time library - Comparison of the three approachesusing classes from the time library
44
5
Discussion
5.1 Types of faults found
In order to find out the types of faults that are being discovered by the evolutionary algorithm,
the faults found by running the evolutionary algorithm for 60 minutes with the metalex class
were analyzed. A total of 43 faults were found in 37 features. A total of 8% of the metalex
features contained at least one fault. There were basically four types of faults:
1. Class invariant violation: One of the class invariant condition is violated.
2. Precondition violation: The method violated another method precondition.
3. Call on a void object: The method tried to invoke a feature on a void object.
4. Memory / OS Related: Not enough memory or file not found types of faults.
As Figure 5.1 shows, the most predominant type of fault found was the precondition
violation followed by the call on void object. The call on void type of faults are less relevant
compared to the other types because Eiffel is becoming void-safe. That is, Eiffel will assure
at compilation time that there won’t be any call on void objects.
5.2 Parameters
The genetic algorithm was used to both find the parameter values and to determine which
parameters should be used. During the initialization of the chromosome, the genes that
specify if a parameter should be used are randomly initialized to a value between -1 and
45
5.2 Parameters
Figure 5.1: Distribution of the types of faults found in the metalex class -
1. When this value is negative, the parameter it represents is not used. This initialization
assumes that all parameters have the same importance to the system, but this is not true.
By analyzing the chromosomes evolved for the 60 minutes execution of the evolutionary
algorithm from experiment C3, we can find out the importance of each parameter to the
system. Higher usage frequency means greater importance. Figure 5.2 shows how often each
parameter was used.
One notices that the primitive values parameter was used in all successful strategies,
which highlights the importance of this parameter. The seed parameter, on the other hand,
was only used 7.69% of the time. It is important to consider these usage frequency values,
because they directly affect chromosome initialization and thus could considerably decrease
the time needed to evolve good strategies. For example, the evolution of the primitive values
should always be used, whereas the seed parameter can most likely be left out. Although
these values can be used to initialize the chromosome, experiment A1 showed that each class
will benefit differently from the optimization of each parameter. From this we can conclude
that the evolutionary testing is specific to a single class and it might not generalize well for
a set of classes.
46
5.3 Conclusion
Figure 5.2: Usage frequency of each parameter -
5.3 Conclusion
Based on the results presented in this thesis, we draw a number of conclusions. These
conclusions are grouped in three sections below.
1. Genetic operators
In this work, 3 mutation, 7 crossover and 6 population selection algorithms were evalu-
ated. The results of experiment B2 showed that it is important to introduce new random
values in the chromosome and increase the diversity of the population. Between the
three mutation algorithm evaluated, the flip mutation algorithm was the best. It out-
performed the other two mutation algorithm 92% of the time. The performance of the
crossover algorithm seems to be depended on how the chromosome is defined. Because
the chromosome, in this case, contained many parameters, the crossover algorithms
that affected many sections of the chromosome had a better performance. One of the
main purpose of the population selection algorithm is to control the crowding problem.
However, the system did not seems to have a crowding problem. One possible reason
for that is the low number of generations. Because the evaluation of the population
takes a some time, the number of generations has to be low, which may not be enough
time for crowding.
47
5.4 Considerations
2. Static analysis
The results from experiment C2 showed that a very basic static analysis can increase
the efficiency of automatic test case generators. By combining random primitive values
to the ones extracted from the classes being tested, the number of unique faults found
by Autotest was increased 15% in average.
3. Evolutionary testing
The main goal of this project was to use genetic algorithm to be able to find more
unique faults than random testing. Because the number of new unique faults found
by Autotest considerably decrease with time, as shown in Figure 4.7, the goal for the
evolutionary algorithm was to be able to converge to a higher value. It was not expected
for the evolutionary algorithm to outperform random testing within 15 minutes, since
the evolution of a strategy takes a long time. However, by evolving a strategy for 5
minutes and executing it for 10 minutes, the genetic algorithm was able to find more
unique faults than both the original Autotest and Autotest with static analysis. The
evolutionary testing approach had an even better performance on classes with fewer
faults. For example, Autotest could not find a single fault in the datetimeparser class,
but the evolutionary algorithm found 5 faults. Another advantage of the evolutionary
algorithm is that a testing strategy can be reused. The strategy can be evolved a single
time for multiple executions of Autotest, whereas random testing has to find interesting
test cases from scratch at every execution.
5.4 Considerations
1. Since the number of unique faults found for each of the 22 classes were added, some of
the unique faults might have been counted multiple times. Autotest tests all features
(including inherited features), so if two classes inherit from the same class which has a
bug, this bug will be counted as a fault for both classes. This issue does not threaten
the comparison between evolutionary testing and random testing, since both strategies
were tested under the same conditions. It is possible, however, that the number of
faults found by both the Autotest and the evolutionary testing approach were inflated
due to this issue.
48
5.5 Further improvement
2. Time was the main resource used for comparison in this study, with processing power not
taken into account. The evolutionary algorithm used four threads to evolve a strategy.
After the strategy was evolved Autotest was executed in a single thread. Autotest,
however, was executed in a single thread. This could slightly decrease the improvement
of evolutionary testing for short executions, but because the random testing slowly
converges, for longer executions this would not affect the number of faults found.
5.5 Further improvement
Some possible improvements and research directions are presented below.
• Code coverage - at the moment, only the number of unique faults found is being used
by the genetic algorithm to optimize the automatic generation of test cases. It might
be useful to consider the code coverage.
• Efficiency - the efficiency of the algorithm can be improved by combining the genetic
algorithm and Autotest into a single system. At the moment, every time Autotest is
invoked from the genetic algorithm, it has to load and parse the class under test.
• Reusing strategies - the strategies evolved by the genetic algorithm may be reused
when evolving a new strategy for a new class.
• Static analysis - this system uses a vary naive static analysis technique, a more
advanced technique may be used to extract primitive values and generate values around
those values.
49
Appendix A
Primitive Values
Primitive type Values
BOOLEAN True, False
CHARACTER 8 1 to 255
CHARACTER 32 1 to 600
REAL 32 -100.0, -2.0, -1.0, 1.0, 2.0, 100.0, 3.40282e+38, 1.17549e-38,1.19209e-07
REAL 64 -1.0, 1.0, -2.0, 2.0, 0, 3.14159265358979323846,-2.7182818284590452354, 2.2250738585072014e-308,2.2204460492503131e-16,1.7976931348623157e+308
INTEGER 8 -100, -10 ,-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10, 100, Min, Max
INTEGER 16 -100, -10 ,-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10, 100, Min, Max
INTEGER 32 -100, -10 ,-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6,7, 8 ,9, 10, 100, Min, Max
INTEGER 64 -100, -10 ,-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10, 100, Min, Max
NATURAL 8 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max
NATURAL 16 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max
NATURAL 32 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max
NATURAL 64 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max
Table A.1: Autotest primitive values.
50
Appendix B
Chromosome specification
# Parameter Starting index Finishing index Alleles values
1 Boolean 0 (1 ∗ η)− 1 -1, 1
2 Char 32 (1 ∗ η)− 1 (2 ∗ η)− 1 0, 600
3 Char 8 (2 ∗ η)− 1 (3 ∗ η)− 1 0, 255
4 Integer 16 (3 ∗ η)− 1 (4 ∗ η)− 1 -32768, 32767
5 Integer 32 (4 ∗ η)− 1 (5 ∗ η)− 1 -2147483648, 2147483647
6 Integer 64 (5 ∗ η)− 1 (6 ∗ η)− 1 -9223372036854775808,
9223372036854775807
7 Integer 8 (6 ∗ η)− 1 (7 ∗ η)− 1 -128, 127
8 Natural 16 (7 ∗ η)− 1 (8 ∗ η)− 1 0, 65535
9 Natural 32 (8 ∗ η)− 1 (9 ∗ η)− 1 0, 4294967295
10 Natural 64 (9 ∗ η)− 1 (10 ∗ η)− 1 0, 18446744073709551615
11 Natural 8 (10 ∗ η)− 1 (11 ∗ η)− 1 0, 255
12 Seed (11 ∗ η)− 1 (12 ∗ η)− 1 0, 100
13 Real 32 (12 ∗ η)− 1 (13 ∗ η)− 1 -1.0e30, 1.0e30
14 Real 64 (13 ∗ η)− 1 (14 ∗ η)− 1 -1.0e30, 1.0e30
15 Method call (14 ∗ η)− 1 (15 ∗ η)− 1 1, 100
16 creation probability (15 ∗ η)− 1 (16 ∗ η) 0.15,0.3
17 evolving primitive (16 ∗ η) + 1 (17 ∗ η) + 2 -1.0,1.0
18 evolving seed (17 ∗ η) + 2 (18 ∗ η) + 3 -1.0,1.0
19 evolving creation probability (18 ∗ η) + 3 (19 ∗ η) + 4 -1.0,1.0
20 sequential method invocation (19 ∗ η) + 4 (20 ∗ η) + 5 -1.0,1.0
22 evolving method call (20 ∗ η) + 5 (21 ∗ η) + 6 -1.0,1.0
Table B.1: Chromosome specification.
51
Appendix C
Chromosome files
# Parameter File name
1 Boolean boolean.txt
2 Char 32 character 32.txt
3 Char 8 character 6.txt
4 Integer 16 integer 16.txt
5 Integer 32 integer 32.txt
6 Integer 64 integer 64.txt
7 Integer 8 integer 8.txt
8 Natural 16 natural 16.txt
9 Natural 32 natural 32.txt
10 Natural 64 natural 64.txt
11 Natural 8 natural 8.txt
12 Seed seed.txt
13 Real 32 real 32.txt
14 Real 64 real 64.txt
15 Method call method call sequence.txt
16 creation probability creation probability.txt
Table C.1: Chromosome files.
52
Bibliography
[1] Y. Cheon and G. T. Leavens. A simple andpractical approach to unit testing: The JMLand JUnit way. Technical Report 01-12, De-partment of Computer Science, Iowa StateUniversity, Nov. 2001. 1, 5
[2] I. Ciupa, A. Leitner, M. Oriol, and B.Meyer. Experimental assessment of randomtesting for object-oriented software. In Pro-ceedings of the International Symposiumon Software Testing and Analysis 2007 (IS-STA07), pages 8494, 2007. 3, 7, 17
[3] NIST (National Institute of Standardsand Technology): The Economic Impactsof Inadequate Infrastructure for SoftwareTesting, Report 7007.011, available atwww.nist.gov/director/prog-ofc/report02-3.pdf 1
[4] Eric Bezault et al.: Gobo library and tools,at www.gobosoft.com. 1, 5
[5] Xanthakis, S. Ellis, C. Skourlas, C., LeGall,A, Katsikas, S, Application of Genetic Al-gorithms to Software Testing, Proceedingsof the 5th International Conference of Soft-ware Engineering, pages 625-636, France,December, 1992. 2
[6] Shultz, A., Grefenstette, J., De Jong, K.,Test & Evaluation by Genetic Algorithms,Navy Center for Applied Research in Arti-ficial Intelligence, IEEE, 1993. 2
[7] Hunt, J., Testing Control Software using aGenetic Algorithm, Working Paper, Univer-sity of Wales, UK, 1995. 2
[8] Roper, M., Maclean, I., Brooks, A., Miller,J.,Wood, M., Genetic Algorithms and theAutomatic Generation of Test Data, Work-ing Paper, Department of Computer Sci-ence, University of Strathclyde, UK, 1991.2
[9] Watkins, A., The Automatic Generationof Software Test Data using Genetic Algo-rithms, Proceedings of the Fourth SoftwareQuality Conference, 2: 300-309, Dundee,Scotland, July, 1995. 1, 2
[10] Alander, J., Mantere, T. and Turunen, P,Genetic Algorithm Based Software Testing,in G. Smith, N. Steele and R. Albrecht, edi-tors, Artificial Neural Nets and Genetic Al-gorithms, Springer-Verlag, Wien, Austria,pages 325-328, 1998. 2
[11] Tracey, N., Clark, J., Mander, K., Auto-mated Program Flaw Finding Using Sim-ulated Annealing, ISSTA-98, ClearwaterBeach, Florida, USA, 1998. 2
[12] Borgelt, K., Software Test Data Genera-tion From A Genetic Algorithm, IndustrialApplications Of Genetic Algorithms, CRCPress 1998. 2
[13] Pargas, R., Harold, M., Peck, R., Test DataGeneration Using Genetic Algorithms, Soft-ware Testing, Verification And Reliability,9: 263-282, 1999. 1, 2
[14] Jones,B.,Sthamer, H. and D. Eyres. Au-tomatic structural testing using geneticalgorithms. Software Engineering Jour-nal,11(5):299306, 1996. 2
53
BIBLIOGRAPHY
[15] Lin, J-C. and Yeh, P-U. Automatic TestData Generation for Path Testing usingGAs, Information Sciences, 131: 47-64,2001. 2
[16] Michael, C., McGraw, G., Schatz, M., Gen-erating Software Test Data by Evolution,IEEE Transactions On Software Engineer-ing, 27(12), December 2001. 1, 2
[17] Wegener, J., Baresel, A., Sthamer, H., Evo-lutionary Test Environment for AutomaticStructural Testing, Information & SoftwareTechnology, 2001. 2
[18] Harman M. The automatic generation ofsoftware test data using genetic algorithms.Ph.D. thesis, University of Glamorgan, Pon-typrid, Wales, Great Britain, 1996. 1, 2
[19] Daz E, Tuya J., and Blanco R. AutomatedSoftware Testing Using a MetaheuristicTechnique Based on Tabu Search, In 18thIEEE International Conference on Auto-mated Software Engineering, pp. 310-313,2003. 2
[20] Berndt, D., Fisher, J., Johnson, L., Ping-likar, J., and Watkins, A. (2003). Breed-ing Software Test Cases with Genetic Al-gorithms. In 36th Annual Hawaii Int. Con-ference on System Sciences (HICSS2003). 2
[21] D. J. Berndt, A. Watkins, High VolumeSoftware Testing using Genetic Algorithms,hicss,pp.318b, Proceedings of the 38th An-nual Hawaii International Conference onSystem Sciences (HICSS’05) - Track 9, 20052
[22] Alba E., and Chicano J. F. Software Testingwith Evolutionary Strategies, Proceedingsof the Rapid Integration of Software Engi-neering Techniques (RISE-2005), Heraklion,Grecia, 2005 2
[23] McMinn P., and Holcombe M. Evolution-ary testing of state-based programs. InProceedings of the Genetic and Evolution-ary Computation Conference (GECCO05),pages 1013 1020. Washington DC, USA,June 2005. 2
[24] Tonella, P. Evolutionary Testing of Classes.In Proceedings of the 2004 ACM SIGSOFTinternational symposium on Software test-ing and analysis (ISSTA 04), ACM Press,New York, NY (2004) 119-128 2, 20
[25] Stefan Mairhofer, Search-based softwaretesting and complex test data generation ina dynamic programming language, Masterthesis 2008 2
[26] Harman, M., and McMinn, P. A theoreti-cal & empirical analysis of evolutionary test-ing and hill climbing for structural test datageneration. In ISSTA 07: Proceedings of the2007 international symposium on Softwaretesting and analysis (New York, NY, USA,2007), ACM, pp. 7383. 1, 2
[27] Wappler, S., and Lammermann, F. Usingevolutionary algorithms for the unit testingof object-oriented software. In GECCO 05:Proceedings of the 2005 conference on Ge-netic and evolutionary computation (NewYork,NY, USA, 2005), ACM, pp. 10531060.2
[28] Wappler, S., and Wegener, J. Evolution-ary unit testing of object- oriented soft-ware using a hybrid evolutionary algorithm.In CEC06: Pro- ceedings of the 2006IEEE Congress on Evolutionary Computa-tion (2006), IEEE, pp. 851858. 2
[29] ECMA-367 Eiffel: Analysis, Designand Programming Language, 2nd Edi-tion. http://www.ecma-international.org/
54
BIBLIOGRAPHY
publications/standards/Ecma-367.htm. 3,6
[30] Beizer, B.: ’Software Testing Techniques’,Second Edition, New York: van NostrandRheinhold, ISBN 0442206720, 1990 4
[31] Meyer, B. Object-Oriented Software Con-struction, 2nd edition. Prentice Hall, 1997.6
[32] Godefriod, P., Klarlund, N., and Sen, K.Dart:directed automated random testing.In PLDI 05: Proceedings of the 2005ACM SIGPLAN conference on Program-ming language design and implementation(New York, NY, USA, 2005), ACM Press,pp. 213223. 1, 5
[33] Meyer, B., Ciupa, I., Leitner, A., and Liu,L. L. Automatic testing of object-orientedsoftware. In Proceedings of SOFSEM 2007(Current Trends in Theory and Practice ofComputer Science) (2007), J. van Leeuwen,Ed., Lecture Notes in Computer Science,Springer-Verlag. 1, 5
[34] Oriat, C. Jartege: a tool for randomgeneration of unit tests for Java classes.Tech. Rep. RR-1069-I, Centre National
de la Recherche Scientifique, Institut Na-tional Polytechnique de Grenoble, Univer-sitte Joseph Fourier Grenoble I, June 2004.1, 5
[35] De Jong, K. A. (1975). An analysis of thebehavior of a class of genetic adaptive sys-tems (Doctoral dissertation, University ofMichigan). Dissertation Abstracts Interna-tional, 36(10), 5140B. (University Micro lmsNo. 76-9381) 9
[36] Holland, J. H. (1975). Adaptation in naturaland arti cial systems. Ann Arbor: Univer-sity of Michigan Press. 9
[37] Goldberg, D. E. (1989c). Genetic algorithmsin search, optimization, and machine learn-ing. Reading, MA: Addison-Wesley. 13, 14
[38] M. Wall. GAlib: A C++ Library ofGenetic Algorithm Components. MIT,http://lancet.mit.edu/ga/, 1996. 16
[39] I. Ciupa, A. Pretschner, A. Leitner, M.Oriol, and B. Meyer. On the predictabilityof random tests for object-oriented software.In Proceedings of the First InternationalConference on Software Testing, Verifica-tion and Validation (ICST08), April 2008.28, 38
55