evolutionary object-oriented testing - uva · 2020-07-15 · evolutionary object-oriented testing...

Evolutionary Object-Oriented

Testing

Lucas Serpa Silva

Artificial Intelligence

University of Amsterdam

A thesis submitted for the degree of

Msc Artificial Intelligence

Supervised by

Dr. Maarten van Someren

2009, July

mailto:[email protected]

http://www.science.uva.nl/research/isla/

http://www.science.uva.nl/english/home.cfm

Abstract

It is estimated that 80% of software development cost is spent on detecting and

fixing defects. To tackle this issue, a number of tools and testing techniques have

been developed to improve the testing framework. Although techniques such as

static analysis, random testing and evolutionary testing have been used to au-

tomate the testing process, it is not clear what is the best approach. Previous

research on evolutionary testing has mainly focused on procedural programming

languages with simple test data inputs such as numbers. In this work, we present

an evolutionary object-oriented testing approach that combines a genetic algo-

rithm with static analysis to increase the number of faults found within a time

frame. A total of 640 experiments were executed to evaluate the effectiveness of

different genetic algorithms and parameters. The system results are compared to

the results obtained by running a random test case generator for 15, 30 and 60

minutes. The results show that genetic algorithm combined with static analysis

can considerably increse the number of faults found compared to random testing.

In some cases, evolutionary testing found more faults in 15 minutes then a random

testing strategy found in 60 minutes.

Acknowledgements

I would like to thank my supervisor, Maarten van Someren for his support, guid-

ance and constructive comments throughout this work. I would also like to thank

Yi Wei for the various discussions regarding Autotest, code coverage and auto-

mated testing. A special thanks goes to Olga Nikolayeva for many invaluable

suggestions and the time she spent proofreading and reviewing this thesis.

Contents

List of Figures vii

List of Tables ix

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Past research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.3 Project goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 4

2.1 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Black box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.2 White box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.3 Automated testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Eiffel & Design by Contract . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Autotest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.1 Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.1 Chromosome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.2 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.4.3 Crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4.4 Objective and fitness value . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4.5 Selecting individuals for reproduction . . . . . . . . . . . . . . . . . . 13

2.4.6 GA Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Evolutionary testing 16

3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

iv

CONTENTS

3.1.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.2 Algorithm stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1.2.1 Allele value specification . . . . . . . . . . . . . . . . . . . . 18

3.1.2.2 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.2.4 Mutation and crossover . . . . . . . . . . . . . . . . . . . . . 20

3.2 Evolutionary Autotest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Experiments 25

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.2 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.3 Experiments Group A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3.1 Experiment A1: Autotest parameters . . . . . . . . . . . . . . . . . . 27

4.4 Experiments Group B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.4.1 Experiment B1: Mutation probability . . . . . . . . . . . . . . . . . . 29

4.4.2 Experiment B2: Mutation algorithm . . . . . . . . . . . . . . . . . . . 29

4.4.3 Experiment B3: Crossover probability . . . . . . . . . . . . . . . . . . 31

4.4.4 Experiment B4: Crossover algorithm . . . . . . . . . . . . . . . . . . . 33

4.4.5 Experiment B5: Selection method . . . . . . . . . . . . . . . . . . . . 34

4.5 Experiments Group C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5.1 Experiment C1 :Original Autotest . . . . . . . . . . . . . . . . . . . . 36

4.5.2 Experiment C2: Autotest with static analysis . . . . . . . . . . . . . . 38

4.5.3 Experiment C3: Evolutionary testing . . . . . . . . . . . . . . . . . . . 40

5 Discussion 45

5.1 Types of faults found . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.4 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.5 Further improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

A Primitive Values 50

B Chromosome specification 51

v

CONTENTS

C Chromosome files 52

Bibliography 53

vi

List of Figures

2.1 Example of Design by Contracttm . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Autotest algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Autotest algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Genetic Algorithm flow diagram . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Examples of mutation algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.6 One and two points crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.7 Order crossover examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.1 Four basic components of the system . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Four stages of the genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Parallel population evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Corrupted chromosome caused by crossover . . . . . . . . . . . . . . . . . . . 21

3.5 Valid chromosome crossover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.6 Evolutionary Autotest 1 - loading chromosome and evolve.conf . . . . . . . . 23

3.7 Evolutionary Autotest 2 - method call . . . . . . . . . . . . . . . . . . . . . . 23

3.8 Evolutionary Autotest 3 - object creation . . . . . . . . . . . . . . . . . . . . 24

4.1 Number of faults found using random and static analysis technique to select

the initial primitive values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Number of faults for mutation algorithms for each class . . . . . . . . . . . . 31

4.3 Effect of mutation and crossover probability on the number of faults . . . . . 32

4.4 Comparison of crossover algorithms . . . . . . . . . . . . . . . . . . . . . . . . 34

4.5 Comparison of selection algorithms . . . . . . . . . . . . . . . . . . . . . . . . 34

4.6 Variation on the total number of faults found . . . . . . . . . . . . . . . . . . 37

4.7 Autotest progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.8 Evolutionary approach on α set . . . . . . . . . . . . . . . . . . . . . . . . . . 41

vii

LIST OF FIGURES

4.9 Evolutionary testing on β set . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.10 Total number of faults found for all classes over time by the three approaches 43

4.11 Evolutionary approach time library . . . . . . . . . . . . . . . . . . . . . . . . 44

5.1 Distribution of the types of faults found in the metalex class . . . . . . . . . . 46

5.2 Usage frequency of each parameter . . . . . . . . . . . . . . . . . . . . . . . . 47

viii

List of Tables

1.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

4.1 Test classes α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Test classes β . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3 Genetic algorithm setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.4 Autotest parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5 Mutation probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.6 Mutation methos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.7 Crossover probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.8 Crossover methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.9 Population selection schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.10 Original Autotest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.11 Original Autotest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.12 Original Autotest executions . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.13 Autotest with static analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.14 Autotest with static analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.15 Time allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.16 Execution setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.17 Evolutionary algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.18 Evolutionary algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

A.1 Autotest primitive values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

B.1 Chromosome specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

C.1 Chromosome files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

ix

1

Introduction

1.1 Motivation

In the past 50 years the growing influence of software in all areas of industry lead to an ever-

increasing demand for complex and reliable software. According to a study(3) conducted by

the National Institute of Standard & Technology, approximately 80% of the development cost

is spent on identifying and correcting defects. The same study found that software bugs cost

the United States economy around $59.5 billion a year, with one third of this value being

attributed to the poor software testing infrastructure. In the effort to improve the existing

testing infrastructure, a number of tools have been developed to automate the test execution

such as JUnit(1) and GoboTest(4). However, the automation of test data generation is still

a topic under research. Recently, a number of methods such as metaheuristic search, random

test generation and static analysis have been used to completely automate the testing process,

but the application of these tools to real software is still limited. Random test case generation

has been used by a number of tools (Jartege(34), Autotest(33), Dart(32)) that automate the

generation of test cases, but a number of studies found a genetic algorithm (evolutionary

testing) to be more efficient and to outperform random testing(9; 13; 16; 18; 26) for code

coverage.

1.2 Past research

The study of genetic algorithms as a technique for automating the process of test case gener-

ation is often referred to as evolutionary testing in the literature. Since the early 90s, there

has been a number of studies on evolutionary testing. The complexity and applicability of

1

1.2 Past research

these studies vary. In order to classify the relevance of past research to this project, a number

of studies are classified according to the complexity of the test cases being generated and the

optimization parameter used by the genetic algorithm. The complexity of the test cases be-

ing generated is important because to generate test cases for structured programs that only

take simple input, such as numbers is simpler than generating test cases for object-oriented

programs.

Reference Year Language type Optimization parameter

(5)Xanthakis, S. 1992 Procedural (C ) Branch coverage(6)Shultz, A.. 1993 Procedural (Vehicle Simulator) Functional(7)Hunt, J. 1995 Procedural (POP11[X] ) Functional (Seeded errors)(8)Roper, M.. 1995 Procedural (C) Branch coverage(9)Watkins, A. 1995 Procedural (TRITYP simulator) Path Coverage(10)Alander, J.. 1996 Procedural (Strings) Time(18)Harman M. 1996 Procedural (Integers) Branch coverage(14)Jones, B. 1998 Procedural (Integers) Branch coverage(11)Tracey, N.. 1998 Complex (ADA) Functional (specification)(12)Borgelt, K. 1998 Procedural (TRITYP simulator) Path Coverage(13)Pargas, R.. 1999 Procedural (TRITYP simulator) Branch coverage(15)Lin. 2001 Procedural (TRITYP simulator) Path Coverage(16)Michael, C. 2001 Procedural (GADGET) Branch coverage(17)Wegener, J. 2001 Procedural Branch coverage(19)Daz E 2003 Procedural Branch coverage(20)Berndt, D. 2003 Procedural (TRITYP simulator) Functional(9)A. Watkins 2004 Procedural Functional (Seeded error)(24)Tonella, P 2004 Object-oriented (Java) Branch coverage(21)D. J. Berndt 2005 Procedural (Robot simulator) Functional (Seeded error)(22)Alba .E 2005 Procedural (C) Condition coverage(23)McMinn P. 2005 Procedural (C) Branch coverage(27)Wappler, S. 2005 Object-oriented (Java) Branch, condition coverage(28)Wappler, S. 2006 Object-oriented (Java) Exceptions / Branch coverage(26)Harman, M. 2007 Procedural Branch coverage(25)Mairhofer, S. 2008 Object-oriented (Ruby) Branch coverage

Table 1.1: Previous work.

As shown in Table 1.1, there have been only a few projects that generate test cases for

object-oriented programs, and to the best of our knowledge there was only one project(11)

that generates test cases for object-oriented programs and uses the number of faults found as

2

1.3 Project goals

the optimization parameter for the genetic algorithm. In that study, test cases were generated

for ADA programs, but a formal specification had to be manually specified in a SPARK-Ada

proof context. Thus, the testing process was not completely automated.

Table 1.1 also shows that branch coverage was the optimization parameter used to drive

the evolution of test cases in most other studies. However, there is little evidence on the

correlation between branch coverage and the number of uncovered faults. Although code

coverage is a usefull test suit measurement, the number of faults a test suit unveils is more

important. Past research has shown that evolutionary testing is a good approach to automate

the generation of test cases for structured programs. To make this approach attractive to

industry, however, the system must be able to generate test cases for object-oriented programs

and to use the number of faults found as the main optimization parameter. To the best of

our knowledge, there is currently no existing project that fulfils these two requirements.

1.3 Project goals

This project has three goals:

1. to use genetic algorithms to automatically generate test cases for object-oriented pro-

grams written in Eiffel and to use the number of faults found as the optimization

parameter for the genetic algorithm.

2. to investigate the effect of different genetic algorithms on the number of faults found

when generating test cases for object-oriented software.

3. to combine evolutionary testing with static analysis and evaluate if this improves the

results.

The base hypothesis for this work is that evolutionary testing finds more faults and in less

time than random testing. This project innovates by using the number of faults as the main

optimization parameter for the genetic algorithm and combining static analysis to a genetic

algorithm. It also extends the existing research in evolutionary testing by providing a study on

the effect of different genetic algorithm techniques, such as mutation and crossover algorithms

on the evolution of test cases for object-oriented software.

This project is based on the Autotest(2) tool and the Design by Contracttm methodology

implemented by the Eiffel programming language(29).

3

2

Background

2.1 Testing

Testing is one of the most used software quality assessment methods. There are two important

processes when testing object-oriented software. First, the software has to be initialized with

a set of values. These values are used to set a number of variables that are relevant for the

test case. The values of these variables define a single state from the possible set of states.

These values can either be a primitive value such as an integer or complex values such as an

object. With the software initialized, its methods can then be tested by calling them. If a

method takes one or more objects as parameters, these objects also have to be initialized.

To determined if the test case passed or fail, a software specification has to be used. The

software specification defines what should be the output of the software and what is a valid

input. Because of the number of possible states a software may have is exponential, it is

impossible to test all of them. Interesting states are normally identified by the developers

according to a software specification or program structure . There are many types of testing.

However, they can all be classified as either black box or white box testing.

2.1.1 Black box

The Black box testing, also called functional testing(30), will consider the unit under test as a

black box where data is fed-in and the output is verified according to a software specification.

Functional testing has the advantage that it is uncoupled from the source code, because

given the software specification, test data can be generated even before the function has

been implemented. Functional testing is also closely related to the user requirements since

4

2.2 Eiffel & Design by Contract

it is testing a function of the program. Its main disadvantage is that it requires a software

specification and it may not explore the unit under test well since it does not know the code

structure.

2.1.2 White box

The white box testing technique, also called structural testing, will take into account the

internal structure of the code. By analyzing the structure of the code, different test data can

be generated to explore those specific areas. Structural testing may also be used to measure

how much of the code has been covered according to some structural criteria. By analyzing

the program flow and the path an execution took, a code coverage can be computed given

certain criteria such as statement coverage, which computes the number of unique statements

executed.

2.1.3 Automated testing

To automate the testing process, both the generation of test data and the execution of

test cases have to be automated. There are already a number of tools such as JUnit(1)

and GoboTest(4) that automate the test case execution but the main problem lies on the

automation of the test data generation. Since the number of possible input date is huge,

the problem can be viewed as an optimization problem, where the optimal solution is a set

of test data that triggers all fault in the software. There are some tools that will randomly

generate test data such as Autotest (33), DART (32) and Jartege(34), but there are many

optimization algorithms that are considered better then random.


The lack of software specification is one of the main problems when automatically generating

test cases. Without specification it is impossible to be sure that a feature1 has failed. Even

when the test case leads the program to crash or throw an exception, it is not clear if the

software has a fault since the program could have not been defined for the given input.

Normally, the developers will write a header as a comment for each method, describing its1Feature means either a procedure or a function. In this report feature and method are interchangeably

used to refer to a procedure or a function.

5


behaviour. Although there are guidelines on how to write these headers, they are not formal

enough to allow the derivation of the method’s pre- and postcondition.

This problem has been dealt by the Eiffel programming language(29), which, besides

other methodologies, implements the Design by Contracttm(31) concept. The idea behind

the Design by Contracttm is that each method call is a contract between the caller (client)

and the method (supplier). This contract is specified in terms of what the client must

provide and what the suppliers guarantees in return. This contract is normally written in

the form of pre- and postcondition boolean expressions for each method. In the example

illustrated in Figure 2.1, the precondition is composed by four boolean expressions and the

postcondition by two boolean expressions. These expressions are evaluated sequentially upon

method invocation and termination. The system will throw an exception as soon as one

of the precondition or postcondition boolean expression is evaluated to false. Therefore,

the method caller must ensure the precondition is true before calling the method call and

the method must ensure that the postcondition is true before returning. For example, the

borrow book method shown in Figure 2.1 takes the id of a borrower and the id of the book

this borrower wants to borrow. The method caller must ensure that the book id is a valid id,

it has at least one copy available, the borrower id is a valid id and the borrower can borrow

books. If these conditions are fulfilled, the method guarantees that it will add the book to the

borrower’s list of borrowed book and decrease the number of copies available by one. Apart

from the pre- and postcondition, every class has an invariant condition that has to remain

true after the execution of the constructor and loops may have variants and invariants. With

Design by Contracttm a method has a fault if it:

Figure 2.1: Example of Design by Contracttm -

6

2.3 Autotest

1. violates another method’s precondition.

2. does not fulfil its own postcondition.

3. violates the class invariant.

4. violates loop variant or invariant.

For the automation of test case generation, Design by Contracttm can be used to de-

termine if the generated test data is defined for a given method by checking it against the

precondition. It can also be used to check if a method has failed or not by comparing the

result against the postcondition. In the next section we discuss how this idea is implemented

in the Autotest tool (2).

2.3 Autotest

Autotest exploits the Design by Contracttm methodology implemented in Eiffel to automat-

ically generate random test data for Eiffel classes. Autotest works with a given timeout and

a set of classes to be tested.

Figure 2.2: Autotest algorithm 1 - Method invocation

Autotest starts by loading the classes to be tested and creating a table containing all (in-

cluding the inherited) methods of those classes. As described in the algorithm 2.2, Autotest

will randomly select methods to test while the timeout has not expired. Autotest chooses

the method to be tested (line 4) and the creation method (line 23) randomly. Autotest

7

2.3 Autotest

uses a probability to determine if a new object should be created or selected from the object

pool (line 11). The object pool is a set of all objects created by Autotest. The idea behind

the object pool is that reusing the objects that might have been modified during a previous

method call will increase the chance of finding more faults. When creating an object, Au-

totest uses different algorithms for extended and non extended types. Extended types are

the primitive types such as Integer, Boolean, Character and so on. For these types, Autotest

must provide an initial value as shown in Figure 2.3. The initial values for the extended types

are randomly selected from a set of fixed values chosen by the developers. These values are

listed in appendix A.1.

Figure 2.3: Autotest algorithm 2 - Object creation

When instantiating objects that are not of the extended type, Autotest will randomly

select one of its creation procedures and invoke it. After the timeout expires, Autotest will

generate a report containing the number of test cases generated, the number of failures, the

number of unique failures, the number of invalid test cases and will reproduce the code that

triggers the faults it found.

2.3.1 Faults

Eiffel will throw an exception whenever a contract is violated (precondidion, postcondition,

class invariant, loop invariant, loop variant). Autotest will then examine the exception to

find out if it was triggered by an invalid test case or by an actual fault in the code. Invalid

test cases are the test cases that violate the precondition of the feature being tested. If it

8

2.4 Genetic Algorithm

is a valid test case, Autotest will check if this fault is unique by looking at the line of code

where the exception happened and compare to all unique faults it has already found. Beside

the faults triggered by the Design by Contracttm conditions, other exceptions triggered by

calling methods on void object, lack of memory are also considered as valid test cases.


Genetic Algorithms (GA) are search algorithms based on the natural selection as described

by Charles Darwin. They are used to find solutions to optimization and search problems.

Genetic algorithms became popular when John Holland published the “Adaptation in Natu-

ral and Artificial Systems”(36) in 1975 and De Jong finished an analysis of the behaviour of

a class of genetic adaptive systems(35) in the same year. The basic idea of a GA is to encode

the values of the parameters of an optimization problem in a chromosome which is evaluated

by an objective function. As shown in Figure 2.4, the algorithm starts by initializing or

randomly generating a set of chromosomes (population). At the end of each generation, each

chromosome is evaluated and modified according to a number of genetic operations in order

to produce a new population. This process repeats until a predefined number of generations

is computed or until the objective value of the population has converged.

2.4.1 Chromosome

Each individual in the population is represented by a chromosome that stores the values of

the optimization problem. The chromosome is normally encoded as a list of bits, but its

encoding and structure can vary. Each gene of the chromosome can have a specific allele.

An allele specifies the range or the possible values that the gene can have. To evaluate each

chromosome, an objective function must be defined. The objective function uses the values

encoded on the chromosome to check how well it performs in the optimization problem. At

the end of each generation a number of genetic operations such as mutation and crossover

are applied to each chromosome to produce the population for the next generation.

2.4.2 Mutation

When a chromosome is passed on, it has a probability that some of its genes will not be

copied correctly and undergo a small mutation. Mutation ensures that the solutions of the

9


Figure 2.4: Genetic Algorithm flow diagram -

new generation are not identical to those of the previous one. The mutation probability

controls how much of the chromosome will mutate. A small probability leads to a slower

convergence, while a large probability will lead to instability. The mutation operator can be

defined in different ways. Three basic mutation operation are described below.

1. Flip mutator will change a single gene of the chromosome to a random value according

to the range specified by the alleles.

2. Swap mutator will randomly swap a number of genes of the chromosome.

3. Gaussian mutator will pick a new value around the current value using a gaussian

distribution.

The mutation operation is defined according to the structure of the chromosome. When

the chromosome is stored in a tree, one possible mutation is to swap subtrees as shown in

Figure 2.5.

2.4.3 Crossover

Crossover is the process where two or more chromosomes are combined to form one or more

chromosomes. The idea behind crossover is that the offspring may be better than both

10


Figure 2.5: Examples of mutation algorithms -

parents. Crossover is normally done between two individuals, but more can be used. There

are many crossover algorithms, some of them are described below:

1. Uniform crossover will randomly select the parent where each gene should come from.

2. Even odd crossover will select the genes with even index from parent A and the genes

with odd index from parent B.

3. One point crossover will randomly select a position on the chromosome and all the

genes to the left come from parent A and the genes to the right come from parent B.

4. Two points crossover will randomly select two positions and pick the genes from parent

A which have a greater index than the smaller position and a smaller index than the

biggest position. The remaining genes come from parent B.

5. Partial match crossover will produce two children C1 and C2. It initializes C1 by

copying the chromosome of the parents A and C2 by copying the chromosome of parent

B. It will then randomly select a number of positions and swap the genes between C1

and C2 at those positions.

6. Order crossover produces two children C1 and C2. It initializes by copying the

genes of the parents to the children and deleting n genes randomly selected from each

11


Figure 2.6: One and two points crossover -

offspring. It then selects an interval with size n and slides the genes such that the

interval is empty. It then select the original genes in that interval from the opposite

offspring. The algorithm is illustrated in Figure 2.7.

Figure 2.7: Order crossover examples -

7. Cycle crossover produces two children C1 and C2. It initializes C1 and C2 by copy-

ing the chromosomes of the parents A and B respectively. Then it selects n random

positions and replaces the genes from C1 with genes from parent B in those positions.

The process is repeated for C2 with parent A.

2.4.4 Objective and fitness value

The objective value is the performance measurement for each chromosome. This value is

used to aid the selection of chromosomes for crossover. It can be used directly to select the

good chromosomes for crossover, but it is normally scaled to produce a fitness value. The

scaling function is one method that can be used to minimize the elitism problem described in

12


section 2.4.5, where only a limited number of chromosomes is involved in producing the next

generation. This fitness value is then used to compute compatibility of each chromosome

for crossover. The compatibility is used to ensure that good individuals are not combined

with bad ones. Many methods exist to compute the fitness value; the most common scaling

methods are described below.

1. Linear scaling

fitness = α ∗ objectiveV alue+ β (2.1)

2. Power law scaling

fitness = objectiveV alueα (2.2)

3. Sharing scaling computes the number of genes that the two chromosomes have in

common. Two individual are considered unfit for mating when their difference is very

low, meaning that they are too similar. The difference can be computed using bitwise

operations (37) or other user-specified method if the chromosome is not encoded as bit

strings.

2.4.5 Selecting individuals for reproduction

Elitism and diversity are two important factors when selecting individuals for reproduction.

With elitism, selection is biased towards the individuals with the best objective value. Elitism

is important since it removes bad solutions from the population and reproduces the good

ones. However, by continuously reproducing from a small set of individuals, the population

becomes very similar which may lead to a sub-optimal solution. This effect The diversity of

the population ought to be controlled to ensure that the search space is explored well. Many

selection schemas have been developed to properly select the individuals for reproduction and

to try to minimize the elitism problem. Some of the selection schemas include:

1. Rank schema selects the best individuals of the population every time.

2. Roulette Wheel selects individuals according to their fitness values as compared to

the population. The probability of an individual being picked is:.

p1 =fitness∑len(population)

i=0 fitnessi(2.3)

13


3. Tournament sampling uses the roulette wheel method to select two individuals. Then

it picks the one with the higher fitness value.

4. Uniform sampling selects an individual randomly from the population.

5. Stochastic remainder sampling first computes the probability of each individual be-

ing selected, p1, and its expected representation, ε = p1∗len(population). The expected

representation is used to create a new population of the same size. For example, if an

individual has ε equal to 1.7, it will fill one position in the new population and it has

a probability of 0.7 to fill another position. After the new population is created, the

uniform method is used to select the individuals for mating.

6. Deterministic sampling computes ε of each individual as in the stochastic remainder

sampling. A new population is created and filled with all individuals with ε > 1 and

the remaining positions are filled by sorting the original population’s fractional parts

of ε and selecting the highest individuals on the list.

2.4.6 GA Variations

There are three common types of the Genetic Algorithm. They differ in how the new popu-

lation is computed at the end of each generation.

1. Simple Genetic Algorithm uses a non-overlapping population between generations.

At each generation the population is completely replaced.

2. Steady-state Genetic Algorithm uses an overlapping population where a percentage

of the population is replaced by new individuals.

3. Incremental Genetic Algorithm has only one or two children replacing members of

the current population at the end of each generation.

Compared to other optimization algorithms, genetic algorithm is relative simple and

robust(37). In the past, it has been successfully used to automatically generate test data

to optimize the code coverage as described in section 1.2. In this work, genetic algorithms

are used to automatically generate a set of test cases and optimize the number of faults found.

One of the main reasons we believe genetic algorithm is a good approach for automatically

generating test data is because it can adapt to the code being tested. It is plausible to assume

14


that developers will acquire bad habits with time which leads to a patter of mistakes. One

assumption is that genetic algorithms may be able detect some of these mistakes and tune

the test data generation mechanism to exploit it.

15

3

Evolutionary testing

3.1 Implementation

To link the genetic algorithm to Autotest an evolutionary testing strategy is implemented for

Autotest. This strategy will generate and execute test cases according to parameters specified

in a chromosome generated by a genetic algorithm. To find a good strategy (chromosome), a

genetic algorithm is implemented in C++ using the GAlib(38) library. The communication

between Autotest and the genetic algorithm is done through two files. The four basic com-

ponents of the system are shown in Figure 3.1.

Figure 3.1: Four basic components of the system -

When Evolutionary Autotest is executed, it will load the chromosome from file containing

parameter settings for the Autotest test generator and test the classes for a given amount

of time. In the end, it produces a report containing the objective value (number of unique

faults found) which is used by the genetic algorithm to evaluate how good that chromosome

16

3.1 Implementation

is. The evolution of a testing strategy (chromosome) can be done for a single class or a set.

On this work, however, the evolution of a testing strategy is performed for single classes.

3.1.1 Parameters

Genetic algorithms work by optimizing parameters for a given problem. In order to optimize

the generation of test cases, five different parameters have been used. These parameters

influence how the test cases are generated and how Autotest is executed.

1. primitive values: these specify a set of values for each of the five primitive types

(Integer, Real, Characters , Boolean and Natural). These values are used for creating

objects that are used as input data.

2. method call: specifies which methods should be called and which parameters should

be used for each method call. This parameter is used to set the software into different

states while it is being tested.

3. creation probability: probability of creating a new object instead of reusing one from

the object pool.

4. seed: value used to initialize the pseudorandom number generator.

5. sequential method call: calls the methods of the class under test sequentially and

selects input parameters for each method at random.

As described in section 2.3, Autotest has a fixed set of primitive values and it will call

and create objects randomly. Although a study (2) has shown that the creation probability

parameter can be optimized, it is not obvious which parameters are good for evolutionary

testing. When there is not enough time to optimize the paremeter, it might be better to

use a random strategy or predefied value. The goal is to select the best set of parameters

for each class but because there are 25 possible sets of parameters, it is not feasible to test

all of them for every class. The evolutionary algorithm will optimize all parameters and a

file is used to specify which parameters Autotest should use while executing the evolutionary

strategy. Because finding the best set of parameters for each class is an optimization problem,

the genetic algorithm can also be used to optimize the set of parameters. The chromosome

has been used to specify the values for these parameters, but the chromosome can also be

17

3.1 Implementation

used to specify which parameters should be used. Thus, the genetic algorithm can be used

to optimize both, the set of parameters used and their values.

3.1.2 Algorithm stages

The chromosome is encoded as a list of floating numbers because all the parameters can

be represented as a floating number without much conversion. The implementation of the

genetic algorithm is divided into four stages.

Figure 3.2: Four stages of the genetic algorithm -

1. Specification: Create the chromosome and specify the alleles.

2. Initialization: Create the initial population.

3. Evaluation: Evaluate the population.

4. Mutation and Crossover: Apply mutation and reproduction operations.

3.1.2.1 Allele value specification

As described in section 2.4.1, the alleles can be used to specify the range or a list of possible

values allowed for each gene. Specifying the allele for each gene simplifies the chromosome

encoding and interpretation. For example, the range of valid values for the Character data

type is between 0 and 600, but by randomly selecting a floating number, it is likely that a

number outside this range will be selected since the set of floating numbers is much larger

than the set of Character. This would force the number to be rounded down to 600 or up to

18

3.1 Implementation

0, and lead to a set of characters with similar values. By specifying the allele (0, 600), all the

characters will have the same probability of being picked. The chromosome is created given

the number η of values that is encoded for each primitive type. The starting, finishing index

and the allele specification for each parameter is shown in Appendix B.

3.1.2.2 Initialization

The seed, method call, creation probability are initialized with random values from the range

of values specified by the alleles. The primitive values may be initialized in three different

ways:

1. Randomly: select random values from the range of values specified by the alleles of each

gene.

2. Hard coded: use the original values used by Autotest as specified in Table A.1 and

complete the set of values with random values.

3. Static analysis: in this approach, a simple technique is used to extract primitive values

from the classes under test. The system works by scanning the classes for natural,

integer, real and character values and storing these values. Because the system does

not consider the structure of the code, it will even use values found on comments. These

values are then used to initialize the chromosome combined to random values. When

initializing the chromosome, a probability (0.8) is used to specify whether each value

should come from the set of value obtained using static analysis or from a random value

generator (0.2). This probability is used to avoid initializing a population that is too

similar, by introducing some random values, a level of diversification is guaranteed.

3.1.2.3 Evaluation

When evaluating a chromosome, the genetic algorithm will generate a set of files (shown in

Appendix C) that contain the values of the parameters encoded in that chromosome for a

specific class. Autotest is then executed to test a class for a fixed amount of time and the

number of unique faults found is used as the object value.

Since each chromosome can be executed independently from the others, the evaluation

of the population is executed in parallel. The parallel evaluation of the population works by

creating 4 instances of the code under test and calling Autotest for each one of them. As

19

3.1 Implementation

Figure 3.3: Parallel population evaluation -

illustrated in Figure 3.3, four individuals are evaluated in parallel. This number was chosen

because there were four processors in the computer used for experiments. For an optimal

evaluation, the population size ought to be a multiple of four.

3.1.2.4 Mutation and crossover

To test a piece of software thoroughly, it is important to test it in many different states. A

state can be reached by a particular sequence of method calls. Autotest hopes to achieve

different states by randomly invoking methods. To map this behaviour onto the chromo-

some, the possibility of adding and removing a method call has to be considered because

some states can be reached in two, while others may require seven method calls. Another

problem is that each method call has a certain number of parameters of a specific type. With

these requirements, the crossover operation may produce a corrupted chromosome, since the

number of method calls and parameters for each method call may differ for each chromosome.

Figure 3.4 shows an example where the chromosome stores the method to be called with

the parameters in the same section of the chromosome. Chromosome X will call method a,

method b and method a again. The problem is that method a takes two String parameters.

The combined chromosome, however, will produce a call to method a that takes one String

and one Integer. One possible solution to this problem was described by Tonella(24). Tonella

used grammar to specify syntax. This grammar was then used to drive the mutation and

20

3.1 Implementation

Figure 3.4: Corrupted chromosome caused by crossover -

crossover operations.

In this project, a simpler approach was used to solve the same problem. First, the section

on the chromosome that specifies which methods should be invoked is separated from the

section that specifies which parameters should be used. When a method needs three param-

eters, it reads three slots from the parameter section of the chromosome. If the next method

requires two parameters it will read the next two slots. To ensure that the parameters are of

the right type, the chromosome does not specify the object to be used but instead specifies

an index of the object as shown in Figure 3.5. Since Autotest knows which types are needed

to execute each method, the chromosome just needs to specify which object from the list of

possible objects has to be used. Because the number of methods and the number of available

types is not know in advance, the chromosome assumes a maximum number and the real in-

dex is computed with real index = chromosome index MOD list size. Where the list size

is the list of methods to call or a list of available object of a given type.

With this approach, adding or removing a method call is very simple. Whenever a mu-

tation makes the real index = 0, the method call is removed and when the real index is

modified from 0 to a different number, a method call is added. With this approach different

21

3.2 Evolutionary Autotest

Figure 3.5: Valid chromosome crossover -

mutations and crossover methods can be used without having to worry about the chromosome

getting corrupted.


The evolutionary strategy is executed by specifying the −E option when running Autotest.

It starts by loading a file that specifies which parameters it should use and the chromosome

files which store the values for these parameters. Then it checks if the creation probability

parameter is being used. If so, it sets the probability value. With the Evolutionary Autotest,

there are two new ways to select the methods to be tested with the evolutionary strategy. If

the method call parameter is true, it will compute the real index and select a method from

the method table with that index. If the method call is false and the sequential method is

true, it will select the next method from the table of methods in a sequential manner. When

both the method call and sequential method are true, the method call is used. A random

method is selected if both parameters are false. After Autotest invokes a method, it checks

if the seed parameter is being used, if so, it selects a new seed from the list of seeds as shown

in Figure 3.6.

22


Figure 3.6: Evolutionary Autotest 1 - loading chromosome and evolve.conf -

The invoke method will use the creation probability to decide if new objects should be

created instead of reusing the objects from the object pool. The pseudocode of this method

is shown in Figure 3.7

Figure 3.7: Evolutionary Autotest 2 - method call -

When creating an object, Autotest will check if the method call parameter is being used.

If so, the constructor method is selected according to the real index computed using a value

from the list of method calls encoded in the chromosome. If the method call is not being

used, Autotest will select a random constructor to instantiate the object. If it is creating a

primitive type, it will check whether the primitive value parameter is true. If so, it will get

23


a value from the list of primitive values loaded for each primitive type. Figure 3.8 shows the

pseudocode for the create new input object method.

Figure 3.8: Evolutionary Autotest 3 - object creation -

When all the parameters are false, the evolutionary strategy becomes a random strategy.

24

4

Experiments

4.1 Introduction

The experiments were divided into three groups, each group is concerned with a specific

optimization of the system. The experiments from Group A were executed to find what is

the best way to encode the chromosome by examining the effect of each parameter on the

number of faults found. The experiments from Group B were executed to optimize the

genetic algorithm by evaluating different genetic operators and probabilities. The last set

of experiments from Group C were executed to evaluate the effectiveness of evolutionary

testing compared to the random testing.

4.2 Setting

Twenty two classes were randomly selected from two well-used libraries, time and lex, pro-

vided with EiffelStudio 6.3. The lex library provides a mechanism for building lexical analyz-

ers from regular expressions and the time library provides an abstraction for data and time

computation. The selected classes were divided into two sets α and β. The α set listed in

Table 4.1 was used for optimizing and validating the system and the β set listed in Table 4.2

was only used for validating the system. The tables 4.1 and 4.2 list the number of lines of

code (LOC), the number of local features, the number of feature including the inherited ones

for each class.

All experiments were executed on a single machine with Intel Coretm2 Quad Q6600 with

2Gb of RAM running Linux.

25

4.2 Setting

Library Class Name LOC Local features Total features

Lex automaton 70 7 38

dfa 103 5 42

error list 96 6 121

fixed automaton 77 4 99

fixed dfa 170 7 112

fixed integer set 207 10 58

high builder 820 39 410

Sum 1543 78 880

Time absolute 81 5 81

time 408 29 408

interval 365 30 365

duration 70 5 70

date time parser 379 32 379

date time validity checker 81 3 87

Sum 1384 104 1390

Total 2927 182 2270

Table 4.1: Properties of the classes from the α set

In each experiment, a number of parameters used by the genetic algorithm has to be spec-

ified. These parameters were set according to each experiment with the goal of emphasizing

the part of the algorithm being tested. For example, when evaluating the parameters for

Autotest, it is important to have a bigger population size to increase the number of execu-

tions of Autotest. On the other hand, when evaluating crossover or mutation algorithms, it

is important to increase the number of generations, since these operations are only performed

at the end of each generation. In total six different settings were used. These settings are

specified in Table 4.3.

Library Class Name LOC Local features Total features

Lex text filler 488 28 59

lex builder 1288 55 365

lexical 608 42 99

linked automaton 52 1 116

metalex 201 11 420

ndfa 307 19 56

pdfa 440 22 311

scanning 145 6 426

state of dfa 74 3 101

Sum 3603 187 1953

Table 4.2: Properties of the classes from the β set

26

4.3 Experiments Group A

4.3 Experiments Group A

In section 3.1.1, a number of parameters that specify how test cases should be generated were

identified, but not all parameters might have a positive effect on the evolutionary algorithm.

Some of these parameters might be very sensitive to modification or take very long to be

optimized and this may lead to an overall poorer solution. The goal of the experiments in

group A is to find the best set of parameters that should be encoded in the chromosome.

Configuration Setting 1 Setting 2 Setting 3 Setting 4 Setting 5

population size 30 20 8 4 8

number of generations 4 6 10 10 10

mutation probability 0.6 α 0.6 0.4 0.4

crossover probability 0.8 0.8 0.6 0.4 0.4

replacement percentage 0.4 0.6 0.5 0.5 0.6

Table 4.3: Experiment settings

4.3.1 Experiment A1: Autotest parameters

As described in section 3.1.1, a total of 5 parameters that specify how test cases should

be generated were identified. A total of 12 experiments were executed to find out how

these parameters contribute to the number of unique faults found. These experiments were

executed with genetic algorithm setting 1 specified in Table 4.3. The number of faults found

for each class for different set of parameters are shown in Table 4.4.

The results show that the performance of the parameters is dependent on the class being

tested. According to the results, there is no dominating parameter, as each parameter per-

formed best for at least one class. For instance, the creation probability parameter which had

the worst performance overall, performed the best for the error list class. The method call

parameter performed the best for both dfa and data time validity checker classes. Since

there are 32 possible combinations of parameters, it is unfeasible to test all of them every-

time a class is tested. Thus the technique that optimizes the set of parameters used described

in section 3.1.1 was developed. This experiment was therefore the only experiment that did

not use this technique. This experiment also compared two methods for initializing the prim-

itive values. The Primitives column of Table 4.4 shows the number of faults found when

randomly initializing the primitive values and the column Static analysis shows the number

27

4.4 Experiments Group B

Library Class name Primitives Primitives,

Creation

probability

Primitives,

Sequential

method

call

Primitives,

Seed

Primitives,

Method

call

Static

analy-

sis

Lex automaton 9 8 7 9 8 9

dfa 16 11 16 16 17 16

error list 3 7 3 6 6 6

fixed automaton 11 8 10 9 8 15

fixed dfa 21 13 19 21 17 19

fixed integer set 3 3 3 3 3 3

high builder 13 8 14 13 5 14

Sum 76 58 72 77 64 82

Time absolute 0 1 3 1 3 0

time 4 5 5 6 3 6

interval 3 3 3 3 2 4

duration 0 0 0 0 0 0

date time parser 1 1 0 1 1 1

date time validity

checker

2 2 3 2 4 3

Sum 10 12 14 13 13 14

Total 86 70 86 90 77 96

Table 4.4: The effect of Autotest parameters on the number of unique faults found

of faults found when initializing the primitive values by combining values extracted from the

classes being tested to random values as described in section 3.1.2.2. The number of faults

found using the static analysis technique considerably increased compared to random. Fig-

ure 4.1 shows a comparison of the two approaches. Another interesting result was the poor

performance of the creation probability parameter. The optimization of the primitive and

the creation probability parameter decreased the number of faults found compared to the

optimization of the primitive parameter alone. This negative effect was due to the range of

values (0 to 1) used for this probability. According to (39), Autotest performs bad when the

creation probability is far from the value of 0.25. To improve the performance of the creation

probability parameter, the range of possible values was be decreased to (0.2 to 0.3).


As described in section 2.4, there are different mutation, crossover and population selecting

algorithms. In order to evaluate these genetic operators a total of 65 experiments were

executed.

28


Figure 4.1: Number of faults found using random and static analysis technique toselect the initial primitive values -

4.4.1 Experiment B1: Mutation probability

Mutation probability controls how often the mutation operator is applied to each gene. When

the probability is too low, the genetic algorithm takes longer to converge and when the prob-

ability is too high, the algorithm becomes unstable. To find the best value, 10 experiments

were executed with five different mutation probabilities. The flip mutation algorithm and the

genetic algorithm setting 2 was used in these experiments.

As shown in Table 4.5, the mutation probability does not seems to have a big impact on

the overall performance as long as the probability is not too low. In this case the mutation

probability of 0.4 was just slightly better than 0.8 by finding two more unique faults.

4.4.2 Experiment B2: Mutation algorithm

The mutation algorithm has a direct impact on how the search space is explored. The

three mutation algorithms described in section 2.4.2 were evaluated. To find which mutation

algorithm performed best, a total of 6 experiments were executed with setting 5. The number

of unique faults found for each class is shown in Table 4.6.

The results show that the flip mutation algorithm outperformed the swap mutation algo-

rithm by 36% and the gaussian by 32%. The flip mutator performed the best for all classes

except error list as illustrated in Figure 4.2. One possible reason for the poor performance

29


Library Class Name 0.2 0.4 0.6 0.8 1

Lex automaton 8 9 9 8 9

dfa 16 15 16 15 15

error list 4 6 5 5 5

fixed automaton 9 9 9 9 9

fixed dfa 17 18 18 18 18

fixed integer set 3 3 3 3 3

Sum 57 60 60 58 59

Time high builder 13 15 13 14 14

absolute 3 3 3 3 4

T ime 6 6 5 6 6

interval 3 3 3 3 3

duration 0 0 0 0 0

date time parser 1 1 1 2 1

date time validity checker 4 3 4 4 3

Sum 30 32 29 32 31

Total 87 92 89 90 90

Table 4.5: The effect of the mutation probability on the number of unique faultsfound

Library Class Name Flip Swap Gaussian

Lex automaton 9 8 9

dfa 17 16 15

error list 5 4 6


fixed dfa 24 19 18



Sum 94 73 75

Time absolute 5 2 3

T ime 6 5 5

interval 6 3 3

duration 0 0 0



Sum 26 25 16

Total 120 88 91

Table 4.6: The effect of mutation algorithms on the number of unique faults found

30


of the swap mutator is that it will never introduce new values in the chromosome, thus lim-

iting the search space to the current values. One possible reason for the poor performance of

the gaussian mutator compared to the flip mutator, is that the gaussian mutation algorithm

replaces the value of a gene by a close-by value, which leads to the exploration of states that

are close to the current state.

Figure 4.2: Number of faults for mutation algorithms for each class -

4.4.3 Experiment B3: Crossover probability

The crossover probability controls how much of the population will crossover. A low crossover

probability may lead to a very slow convergence whereas a high value may lead to a high

number of unfit individuals. To find a good crossover probability a total of 14 experiments

were executed using the uniform crossover algorithm. The genetic algorithm setting 5 was

used. The table 4.7 shows the number of unique faults found for each class.

31


Library Class Name 0 0.1 0.2 0.4 0.6 0.8 1

Lex automaton 9 8 9 9 10 7 7

dfa 7 15 15 17 15 14 15

error list 5 6 5 6 10 10 4

fixed automaton 16 10 19 24 12 10 15

fixed dfa 24 20 26 28 30 17 18

fixed integer set 3 3 3 3 3 3 3

high builder 16 17 16 17 18 14 16

Sum 86 79 93 104 98 75 78

Time absolute 5 4 4 5 5 5 5

time 9 6 7 7 7 6 6

interval 4 5 5 5 6 5 5

duration 0 0 0 0 0 0 0

date time parser 5 4 4 4 4 5 4

date time validity checker 4 5 5 5 4 5 5

Sum 27 24 25 26 26 26 25

Total 113 103 118 130 124 101 103

Table 4.7: The effect of crossover probability on the number of unique faults found

The results show that the best crossover probability is around 0.4. Compared to the

mutation probability, the crossover probability had a greater influence on the result as shown

in Illustration 4.3, the crossover probability forms a curve with peak on 0.4 whereas the

mutation probability looks like a straight line. This indicates that the crossover algorithm may

have a greater influence on the number of faults found compared to the mutation algorithm.

Figure 4.3: Effect of mutation and crossover probability on the number of faults -

32


4.4.4 Experiment B4: Crossover algorithm

Combined with the mutation algorithm, the crossover algorithm specifies how the search space

is explored. The crossover algorithm must be able to combine chromosomes in a way that

affects all the values encoded in the chromosome. Since the chromosome is encoded by sec-

tions, where each section represents the values of a single parameter, the crossover algorithm

must be able to mix well all sections of the chromosome. To find a good crossover algorithm,

all algorithms described in section 2.4.3 were evaluated. A total of fourteen experiments were

executed using setting 5. The results are shown in Table 4.8.

Library Class name Uniform Even

Odd

One

Point

Two

Points

Partial

Match

Order Cycle

Lex automaton 8 8 8 9 10 9 8

dfa 14 15 16 15 15 15 14

error list 6 7 5 7 10 5 5

fixed automaton 19 18 19 17 12 16 16

fixed dfa 27 25 27 21 30 26 23

fixed integer set 3 3 3 3 3 3 3

high builder 18 14 18 16 18 13 16

Sum 95 90 86 88 98 87 85

Time absolute 5 5 5 4 5 4 4

time 6 7 6 7 7 6 6

interval 5 5 5 5 6 4 6

duration 0 0 0 0 0 0 0

date time parser 4 4 5 5 4 4 5

date time validity

checker

5 5 5 4 4 6 4

Sum 25 26 26 25 26 23 25

Total 120 116 122 113 124 110 110

Table 4.8: The effect of crossover algorithms on the unique number of faults found

The results indicate that the crossover algorithms that modify more sections of the chro-

mosome had a better performance compared to the ones that only modify a few. By modifying

different parts of the chromosome, the algorithm has a higher chance of modifying the values

of all parameters encoded instead of just one. As shown in Figure 4.4, the uniform and

partial match performed much better then the order and cycle crossover. The discrepancy

between the results of the one and two point crossover algorithm is strange. The difference

on the number of unique faults found might be too high to be attributed to variation. More

experiments would be required to investigate why one point crossover performed much better

than two points.

33


Figure 4.4: Comparison of crossover algorithms -

4.4.5 Experiment B5: Selection method

The selection algorithm is very important to control the diversity of the population. As

described in section 2.4.5, if the same individuals are constantly selected for reproduction,

the population may become too similar and converge to a sub-optimal solution. To find a

good selection mechanism, all algorithms described in section 2.4.5 were evaluated in a total

of 12 experiments with setting 4. The results are shown in Table 4.9.

Figure 4.5: Comparison of selection algorithms -

34

4.5 Experiments Group C

Library Class Name Rank Roulette

Wheel

Tournament Uniform Stochastic

Reminder

Deterministic

Sampling

Lex automaton 7 9 9 9 9 12

dfa 15 14 16 14 15 14

error list 10 5 5 6 5 4

fixed automaton 14 14 12 9 17 15

fixed dfa 18 18 20 20 21 21

fixed integer set 3 3 3 3 3 3

high builder 17 16 14 12 15 14

Sum 84 79 79 73 85 83

Time absolute 4 4 4 4 5 4

T ime 8 5 6 6 7 8

interval 5 4 4 5 4 5

duration 0 0 0 0 0 0

date time parser 4 4 4 4 4 4

date time validity

checker

5 5 5 4 5 5

Sum 26 22 23 23 25 26

Total 110 101 102 96 110 109

Table 4.9: The effect of different population selection schema on the number ofunique faults found

Both the stochastic reminder and the rank method found a total of 110 unique faults.

The rank algorithm does not perform any speciation technique, it only selects the best indi-

viduals, whereas the stochastic reminder will normalize the population as described in 2.4.5.

Among the six selection schemas evaluated, three had a similar result, as shown in Figure

4.5. These results indicate that elitism has a good effect on the number of unique faults

found by the genetic algorithm, since the three best performing algorithms had a greater

probability to select the individual with the best score. In particular, the Rank algorithm

that will constantly select the individuals with the best score for reproduction. The crowding

issue does seems to for this type of genetic algorithm, one possible reason for that is the small

number of generations. Because the computation of the objective function takes a long time,

the number of generation has to be limited, not giving enough time for crowding to happen.


With the optimization of the chromosome and the genetic algorithm completed, the final sys-

tem is compared to an automated random testing system represented by Autotest. The main

35


criterion for evaluation is the number of unique faults found within a given amount of time.

To make the comparison fair, Autotest is executed for the same amount of time taken by the

evolutionary algorithm to evolve a testing strategy and test the system using this strategy.

Three executions of Autotest with different seeds were recorded for 15, 30 and 60 minutes and

the average results are used for comparison. In addition to the set of classes used in the pre-

vious experiments, a new set of classes as described in 4.2 were used for validation the results.

4.5.1 Experiment C1 :Original Autotest

To determine the maximum number of unique faults Autotest can find within a certain

amount of time, Autotest was executed three times for 15, 30 and 60 minutes. The average

number of faults found for each class from the α set are shown in Table 4.10 and for the β

set in Table 4.11.

Library Class Name 15 min 30 min 60 min

Lex automaton 9 9 9dfa 17 17 17error list 12 13 14fixed automaton 10 10 11fixed dfa 24 28 29fixed integer set 1 1 1high builder 19 24 26

Sum 93 103 107

Time absolute 1 1 1time 5 6 6interval 1 1 1duration 0 0 0date time parser 0 0 0date time validity checker 1 1 1

Sum 8 9 9

Total 101 112 116

Table 4.10: Number of unique faults found in the α set by the original Autotest

The variation from the three execution was not so large. Classes with a higher number

of faults had a greater variation compared to the ones with less faults. The variation on the

36



Lex text filler 12 13 14lex builder 16 18 22lexical 24 26 26linked automaton 8 10 10metalex 22 29 33ndfa 8 9 9pdfa 11 12 13scanning 29 31 34state of dfa 11 18 13

Sum 140 160 174

Table 4.11: Number of unique faults found in the β set by the original Autotest

time library was very small, the total number of faults found in the classes from the time

library had a variation of only one fault from the average result. The Figure 4.6 shows the

variation on the total number of faults found in all classes for the three executions of Autotest.

Figure 4.6: Variation on the total number of faults found -

The total number of faults found in each execution and the average for 15, 30 and 60

minutes is shown in Table 4.12.

37


Execution 15 min 30 min 60 min

Average 241 272 290

Execution 1 236 268 282

Execution 2 244 276 295

Execution 3 243 272 293

Table 4.12: Total number of faults found in three executions of Autotest

It can can also be observed that most of the faults were found in the first few minutes

of executions and the number of new faults found rapidly decreased as shown by Figure 4.7.

This result goes in accordance with the results from the random testing predictability study

(39).

Figure 4.7: Autotest progress - Progress of number of faults found

4.5.2 Experiment C2: Autotest with static analysis

Using static analysis to select an initial set of primitive values is a technique that is indepen-

dent from the the evolutionary algorithm. Because this technique was used in the evolutionary

testing system, Autotest was extended to use the same static analysis technique for selecting

primitive values. As described in section 3.1.2.2, this approach extracts the primitive values

from the classes it is testing to use as input for the data generation. This enhanced Autotest

was then executed for 15, 30 and 60 minutes and the number of faults found in the α set is

shown in Table 4.13 and the number of faults found in the β set in Table 4.14.

38



Lex automaton 6 6 6

dfa 16 16 16

error list 11 12 15


fixed dfa 30 35 36



Sum 98 109 134

Time absolute 2 2 2

T ime 5 6 8

interval 3 3 3

duration 0 0 0



Sum 14 15 19

Total 112 124 153

Table 4.13: Faults found in the α set by Autotest with static analysis




lexical 27 30 30


metalex 27 30 33

ndfa 9 9 9

pdfa 11 15 19

scanning 24 30 40


Sum 152 174 206

Table 4.14: Faults found in the β set by Autotest with static analysis

Autotest with static analysis found more faults than the original Autotest for the α set

as shown in Figure 4.8 and the β set as shown in Figure 4.9. This was expected since using

primitive values extracted from the source code increases the probability that test cases

generated using primitive values are relevant. For example, if the source code has a boolean

expression that evaluates x == 782, the probability of randomly returning the value of 782

is very low. But using static-analysis this values is immediately added to a list of relevant

values.

39


4.5.3 Experiment C3: Evolutionary testing

With the crossover, mutation, selection algorithm and probability selected, the final genetic

algorithm can be composed. The final genetic algorithm uses the partial matching crossover

algorithm with a crossover probability of 0.4, the flip mutator with a mutation probability

of 0.4 and the stochastic remainder algorithm to select the population for crossover. With

the parameters chosen, the evolutionary algorithm is executed for 15 , 30 and 60 minutes

for both the α and the β set. This time includes both the evolution and the execution of a

strategy. The time allocation used for each execution is shown in Table 4.15. These values

were chosen based on a preliminary optimization (results not shown).

Total time Evolution Execution

15 min 5 10

30 min 10 20

60 min 15 45

Table 4.15: Time allocation for each execution of the final system

Between the three executions, the population size, number of generation and the size of

the chromosome varied. For short executions it is better to have a smaller chromosome. A

smaller chromosome will converge faster, but it may converge to a sub-optimal solution, so

for longer executions it is better to have a larger chromosome. This effect is mainly due to

the number of primitive values that are encoded in the chromosome. As described in section

3.1.2.1, this number defines the size of the chromosome. The settings used for the three

executions are shown in Table 4.16.

Setting 15 min 30 min 60 min

primitive set size 40 100 100

population size 4 8 8

number of generations 9 12 12

mutation probability 0.4 0.4 0.4

crossover probability 0.4 0.4 0.4

Table 4.16: Settings for the final execution

The evolutionary algorithm was first executed for the 13 classes from the α set. The

number of unique faults found for each class is reported in Table 4.17.

40



Lex automaton 9 9 9

dfa 16 17 17

error list 13 13 15


fixed dfa 32 39 41



Sum 122 126 141

Time absolute 3 2 4

T ime 6 5 10

interval 3 5 5

duration 0 0 0



Sum 15 20 27

Total 122 146 168

Table 4.17: The number of unique faults found in the α set by the GA

Figure 4.8: Evolutionary approach on α set - Number of faults found in the α set

41


A comparison between the original Autotest, Autotest with static analysis and the evo-

lutionary algorithm illustrated in Figure 4.8, shows that the evolutionary algorithm outper-

formed both the original Autotest and Autotest with static analysis. Because the classes

from the α set were used for optimization, the system was tested against a new set of classes,

the β set. The number of faults found in the β set using the evolutionary algorithm is shown

in Table 4.18.




lexical 27 28 32


metalex 25 31 43

ndfa 9 9 11

pdfa 12 14 23

scanning 27 34 41


Sum 157 187 240

Table 4.18: The number of unique faults found in the β set by The GA

Figure 4.9: Evolutionary testing on β set - Comparison of the three approaches using classesfrom the β set

42


The results for the β set also shows the evolutionary algorithm as the best performing

approach. However, the difference between the Autotest with static analysis and the evolu-

tionary algorithm is very small. One possible reason is that the evolutionary algorithm has a

big penalty when it is executed for short times because it won’t have enough time to evolve

a good strategy. This penalty is even higher when a class has a high number of faults, since

most of these faults are found within the first minutes. With longer executions, the time

used to evolve a good strategy starts to pay-off. While Autotest is slowing down the rate

it finds new faults, the evolutionary approach continues to find new faults in a faster pace.

This pattern is shown in Figure 4.10.

Figure 4.10: Total number of faults found for all classes over time by the threeapproaches -

The evolutionary algorithm performed considerably better when the classes being tested

had fewer bugs. This can be seen in Figure 4.11 which shows a comparison of the three

systems using only the classes from the time library.

43


Figure 4.11: Evolutionary approach time library - Comparison of the three approachesusing classes from the time library

44

5

Discussion

5.1 Types of faults found

In order to find out the types of faults that are being discovered by the evolutionary algorithm,

the faults found by running the evolutionary algorithm for 60 minutes with the metalex class

were analyzed. A total of 43 faults were found in 37 features. A total of 8% of the metalex

features contained at least one fault. There were basically four types of faults:

1. Class invariant violation: One of the class invariant condition is violated.

2. Precondition violation: The method violated another method precondition.

3. Call on a void object: The method tried to invoke a feature on a void object.

4. Memory / OS Related: Not enough memory or file not found types of faults.

As Figure 5.1 shows, the most predominant type of fault found was the precondition

violation followed by the call on void object. The call on void type of faults are less relevant

compared to the other types because Eiffel is becoming void-safe. That is, Eiffel will assure

at compilation time that there won’t be any call on void objects.

5.2 Parameters

The genetic algorithm was used to both find the parameter values and to determine which

parameters should be used. During the initialization of the chromosome, the genes that

specify if a parameter should be used are randomly initialized to a value between -1 and

45

5.2 Parameters

Figure 5.1: Distribution of the types of faults found in the metalex class -

1. When this value is negative, the parameter it represents is not used. This initialization

assumes that all parameters have the same importance to the system, but this is not true.

By analyzing the chromosomes evolved for the 60 minutes execution of the evolutionary

algorithm from experiment C3, we can find out the importance of each parameter to the

system. Higher usage frequency means greater importance. Figure 5.2 shows how often each

parameter was used.

One notices that the primitive values parameter was used in all successful strategies,

which highlights the importance of this parameter. The seed parameter, on the other hand,

was only used 7.69% of the time. It is important to consider these usage frequency values,

because they directly affect chromosome initialization and thus could considerably decrease

the time needed to evolve good strategies. For example, the evolution of the primitive values

should always be used, whereas the seed parameter can most likely be left out. Although

these values can be used to initialize the chromosome, experiment A1 showed that each class

will benefit differently from the optimization of each parameter. From this we can conclude

that the evolutionary testing is specific to a single class and it might not generalize well for

a set of classes.

46

5.3 Conclusion

Figure 5.2: Usage frequency of each parameter -

5.3 Conclusion

Based on the results presented in this thesis, we draw a number of conclusions. These

conclusions are grouped in three sections below.

1. Genetic operators

In this work, 3 mutation, 7 crossover and 6 population selection algorithms were evalu-

ated. The results of experiment B2 showed that it is important to introduce new random

values in the chromosome and increase the diversity of the population. Between the

three mutation algorithm evaluated, the flip mutation algorithm was the best. It out-

performed the other two mutation algorithm 92% of the time. The performance of the

crossover algorithm seems to be depended on how the chromosome is defined. Because

the chromosome, in this case, contained many parameters, the crossover algorithms

that affected many sections of the chromosome had a better performance. One of the

main purpose of the population selection algorithm is to control the crowding problem.

However, the system did not seems to have a crowding problem. One possible reason

for that is the low number of generations. Because the evaluation of the population

takes a some time, the number of generations has to be low, which may not be enough

time for crowding.

47

5.4 Considerations

2. Static analysis

The results from experiment C2 showed that a very basic static analysis can increase

the efficiency of automatic test case generators. By combining random primitive values

to the ones extracted from the classes being tested, the number of unique faults found

by Autotest was increased 15% in average.

3. Evolutionary testing

The main goal of this project was to use genetic algorithm to be able to find more

unique faults than random testing. Because the number of new unique faults found

by Autotest considerably decrease with time, as shown in Figure 4.7, the goal for the

evolutionary algorithm was to be able to converge to a higher value. It was not expected

for the evolutionary algorithm to outperform random testing within 15 minutes, since

the evolution of a strategy takes a long time. However, by evolving a strategy for 5

minutes and executing it for 10 minutes, the genetic algorithm was able to find more

unique faults than both the original Autotest and Autotest with static analysis. The

evolutionary testing approach had an even better performance on classes with fewer

faults. For example, Autotest could not find a single fault in the datetimeparser class,

but the evolutionary algorithm found 5 faults. Another advantage of the evolutionary

algorithm is that a testing strategy can be reused. The strategy can be evolved a single

time for multiple executions of Autotest, whereas random testing has to find interesting

test cases from scratch at every execution.

5.4 Considerations

1. Since the number of unique faults found for each of the 22 classes were added, some of

the unique faults might have been counted multiple times. Autotest tests all features

(including inherited features), so if two classes inherit from the same class which has a

bug, this bug will be counted as a fault for both classes. This issue does not threaten

the comparison between evolutionary testing and random testing, since both strategies

were tested under the same conditions. It is possible, however, that the number of

faults found by both the Autotest and the evolutionary testing approach were inflated

due to this issue.

48

5.5 Further improvement

2. Time was the main resource used for comparison in this study, with processing power not

taken into account. The evolutionary algorithm used four threads to evolve a strategy.

After the strategy was evolved Autotest was executed in a single thread. Autotest,

however, was executed in a single thread. This could slightly decrease the improvement

of evolutionary testing for short executions, but because the random testing slowly

converges, for longer executions this would not affect the number of faults found.

5.5 Further improvement

Some possible improvements and research directions are presented below.

• Code coverage - at the moment, only the number of unique faults found is being used

by the genetic algorithm to optimize the automatic generation of test cases. It might

be useful to consider the code coverage.

• Efficiency - the efficiency of the algorithm can be improved by combining the genetic

algorithm and Autotest into a single system. At the moment, every time Autotest is

invoked from the genetic algorithm, it has to load and parse the class under test.

• Reusing strategies - the strategies evolved by the genetic algorithm may be reused

when evolving a new strategy for a new class.

• Static analysis - this system uses a vary naive static analysis technique, a more

advanced technique may be used to extract primitive values and generate values around

those values.

49

Appendix A

Primitive Values

Primitive type Values

BOOLEAN True, False

CHARACTER 8 1 to 255

CHARACTER 32 1 to 600

REAL 32 -100.0, -2.0, -1.0, 1.0, 2.0, 100.0, 3.40282e+38, 1.17549e-38,1.19209e-07

REAL 64 -1.0, 1.0, -2.0, 2.0, 0, 3.14159265358979323846,-2.7182818284590452354, 2.2250738585072014e-308,2.2204460492503131e-16,1.7976931348623157e+308

INTEGER 8 -100, -10 ,-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10, 100, Min, Max

INTEGER 16 -100, -10 ,-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10, 100, Min, Max

INTEGER 32 -100, -10 ,-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6,7, 8 ,9, 10, 100, Min, Max

INTEGER 64 -100, -10 ,-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8 ,9, 10, 100, Min, Max

NATURAL 8 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max

NATURAL 16 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max

NATURAL 32 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max

NATURAL 64 0, 1, 2, 3, 4, 5, 6, 7, 8, 8, 10, 20, 126, 127, 128, 129, Min, Max

Table A.1: Autotest primitive values.

50

Appendix B

Chromosome specification

# Parameter Starting index Finishing index Alleles values

1 Boolean 0 (1 ∗ η)− 1 -1, 1

2 Char 32 (1 ∗ η)− 1 (2 ∗ η)− 1 0, 600

3 Char 8 (2 ∗ η)− 1 (3 ∗ η)− 1 0, 255

4 Integer 16 (3 ∗ η)− 1 (4 ∗ η)− 1 -32768, 32767

5 Integer 32 (4 ∗ η)− 1 (5 ∗ η)− 1 -2147483648, 2147483647

6 Integer 64 (5 ∗ η)− 1 (6 ∗ η)− 1 -9223372036854775808,

9223372036854775807

7 Integer 8 (6 ∗ η)− 1 (7 ∗ η)− 1 -128, 127

8 Natural 16 (7 ∗ η)− 1 (8 ∗ η)− 1 0, 65535

9 Natural 32 (8 ∗ η)− 1 (9 ∗ η)− 1 0, 4294967295

10 Natural 64 (9 ∗ η)− 1 (10 ∗ η)− 1 0, 18446744073709551615

11 Natural 8 (10 ∗ η)− 1 (11 ∗ η)− 1 0, 255

12 Seed (11 ∗ η)− 1 (12 ∗ η)− 1 0, 100

13 Real 32 (12 ∗ η)− 1 (13 ∗ η)− 1 -1.0e30, 1.0e30

14 Real 64 (13 ∗ η)− 1 (14 ∗ η)− 1 -1.0e30, 1.0e30

15 Method call (14 ∗ η)− 1 (15 ∗ η)− 1 1, 100

16 creation probability (15 ∗ η)− 1 (16 ∗ η) 0.15,0.3

17 evolving primitive (16 ∗ η) + 1 (17 ∗ η) + 2 -1.0,1.0

18 evolving seed (17 ∗ η) + 2 (18 ∗ η) + 3 -1.0,1.0

19 evolving creation probability (18 ∗ η) + 3 (19 ∗ η) + 4 -1.0,1.0

20 sequential method invocation (19 ∗ η) + 4 (20 ∗ η) + 5 -1.0,1.0

22 evolving method call (20 ∗ η) + 5 (21 ∗ η) + 6 -1.0,1.0

Table B.1: Chromosome specification.

51

Appendix C

Chromosome files

# Parameter File name

1 Boolean boolean.txt

2 Char 32 character 32.txt

3 Char 8 character 6.txt

4 Integer 16 integer 16.txt




8 Natural 16 natural 16.txt




12 Seed seed.txt

13 Real 32 real 32.txt

14 Real 64 real 64.txt

15 Method call method call sequence.txt

16 creation probability creation probability.txt

Table C.1: Chromosome files.

52

Bibliography

[1] Y. Cheon and G. T. Leavens. A simple andpractical approach to unit testing: The JMLand JUnit way. Technical Report 01-12, De-partment of Computer Science, Iowa StateUniversity, Nov. 2001. 1, 5

[2] I. Ciupa, A. Leitner, M. Oriol, and B.Meyer. Experimental assessment of randomtesting for object-oriented software. In Pro-ceedings of the International Symposiumon Software Testing and Analysis 2007 (IS-STA07), pages 8494, 2007. 3, 7, 17

[3] NIST (National Institute of Standardsand Technology): The Economic Impactsof Inadequate Infrastructure for SoftwareTesting, Report 7007.011, available atwww.nist.gov/director/prog-ofc/report02-3.pdf 1

[4] Eric Bezault et al.: Gobo library and tools,at www.gobosoft.com. 1, 5

[5] Xanthakis, S. Ellis, C. Skourlas, C., LeGall,A, Katsikas, S, Application of Genetic Al-gorithms to Software Testing, Proceedingsof the 5th International Conference of Soft-ware Engineering, pages 625-636, France,December, 1992. 2

[6] Shultz, A., Grefenstette, J., De Jong, K.,Test & Evaluation by Genetic Algorithms,Navy Center for Applied Research in Arti-ficial Intelligence, IEEE, 1993. 2

[7] Hunt, J., Testing Control Software using aGenetic Algorithm, Working Paper, Univer-sity of Wales, UK, 1995. 2

[8] Roper, M., Maclean, I., Brooks, A., Miller,J.,Wood, M., Genetic Algorithms and theAutomatic Generation of Test Data, Work-ing Paper, Department of Computer Sci-ence, University of Strathclyde, UK, 1991.2

[9] Watkins, A., The Automatic Generationof Software Test Data using Genetic Algo-rithms, Proceedings of the Fourth SoftwareQuality Conference, 2: 300-309, Dundee,Scotland, July, 1995. 1, 2

[10] Alander, J., Mantere, T. and Turunen, P,Genetic Algorithm Based Software Testing,in G. Smith, N. Steele and R. Albrecht, edi-tors, Artificial Neural Nets and Genetic Al-gorithms, Springer-Verlag, Wien, Austria,pages 325-328, 1998. 2

[11] Tracey, N., Clark, J., Mander, K., Auto-mated Program Flaw Finding Using Sim-ulated Annealing, ISSTA-98, ClearwaterBeach, Florida, USA, 1998. 2

[12] Borgelt, K., Software Test Data Genera-tion From A Genetic Algorithm, IndustrialApplications Of Genetic Algorithms, CRCPress 1998. 2

[13] Pargas, R., Harold, M., Peck, R., Test DataGeneration Using Genetic Algorithms, Soft-ware Testing, Verification And Reliability,9: 263-282, 1999. 1, 2

[14] Jones,B.,Sthamer, H. and D. Eyres. Au-tomatic structural testing using geneticalgorithms. Software Engineering Jour-nal,11(5):299306, 1996. 2

53

BIBLIOGRAPHY

[15] Lin, J-C. and Yeh, P-U. Automatic TestData Generation for Path Testing usingGAs, Information Sciences, 131: 47-64,2001. 2

[16] Michael, C., McGraw, G., Schatz, M., Gen-erating Software Test Data by Evolution,IEEE Transactions On Software Engineer-ing, 27(12), December 2001. 1, 2

[17] Wegener, J., Baresel, A., Sthamer, H., Evo-lutionary Test Environment for AutomaticStructural Testing, Information & SoftwareTechnology, 2001. 2

[18] Harman M. The automatic generation ofsoftware test data using genetic algorithms.Ph.D. thesis, University of Glamorgan, Pon-typrid, Wales, Great Britain, 1996. 1, 2

[19] Daz E, Tuya J., and Blanco R. AutomatedSoftware Testing Using a MetaheuristicTechnique Based on Tabu Search, In 18thIEEE International Conference on Auto-mated Software Engineering, pp. 310-313,2003. 2

[20] Berndt, D., Fisher, J., Johnson, L., Ping-likar, J., and Watkins, A. (2003). Breed-ing Software Test Cases with Genetic Al-gorithms. In 36th Annual Hawaii Int. Con-ference on System Sciences (HICSS2003). 2

[21] D. J. Berndt, A. Watkins, High VolumeSoftware Testing using Genetic Algorithms,hicss,pp.318b, Proceedings of the 38th An-nual Hawaii International Conference onSystem Sciences (HICSS’05) - Track 9, 20052

[22] Alba E., and Chicano J. F. Software Testingwith Evolutionary Strategies, Proceedingsof the Rapid Integration of Software Engi-neering Techniques (RISE-2005), Heraklion,Grecia, 2005 2

[23] McMinn P., and Holcombe M. Evolution-ary testing of state-based programs. InProceedings of the Genetic and Evolution-ary Computation Conference (GECCO05),pages 1013 1020. Washington DC, USA,June 2005. 2

[24] Tonella, P. Evolutionary Testing of Classes.In Proceedings of the 2004 ACM SIGSOFTinternational symposium on Software test-ing and analysis (ISSTA 04), ACM Press,New York, NY (2004) 119-128 2, 20

[25] Stefan Mairhofer, Search-based softwaretesting and complex test data generation ina dynamic programming language, Masterthesis 2008 2

[26] Harman, M., and McMinn, P. A theoreti-cal & empirical analysis of evolutionary test-ing and hill climbing for structural test datageneration. In ISSTA 07: Proceedings of the2007 international symposium on Softwaretesting and analysis (New York, NY, USA,2007), ACM, pp. 7383. 1, 2

[27] Wappler, S., and Lammermann, F. Usingevolutionary algorithms for the unit testingof object-oriented software. In GECCO 05:Proceedings of the 2005 conference on Ge-netic and evolutionary computation (NewYork,NY, USA, 2005), ACM, pp. 10531060.2

[28] Wappler, S., and Wegener, J. Evolution-ary unit testing of object- oriented soft-ware using a hybrid evolutionary algorithm.In CEC06: Pro- ceedings of the 2006IEEE Congress on Evolutionary Computa-tion (2006), IEEE, pp. 851858. 2

[29] ECMA-367 Eiffel: Analysis, Designand Programming Language, 2nd Edi-tion. http://www.ecma-international.org/

54

BIBLIOGRAPHY

publications/standards/Ecma-367.htm. 3,6

[30] Beizer, B.: ’Software Testing Techniques’,Second Edition, New York: van NostrandRheinhold, ISBN 0442206720, 1990 4

[31] Meyer, B. Object-Oriented Software Con-struction, 2nd edition. Prentice Hall, 1997.6

[32] Godefriod, P., Klarlund, N., and Sen, K.Dart:directed automated random testing.In PLDI 05: Proceedings of the 2005ACM SIGPLAN conference on Program-ming language design and implementation(New York, NY, USA, 2005), ACM Press,pp. 213223. 1, 5

[33] Meyer, B., Ciupa, I., Leitner, A., and Liu,L. L. Automatic testing of object-orientedsoftware. In Proceedings of SOFSEM 2007(Current Trends in Theory and Practice ofComputer Science) (2007), J. van Leeuwen,Ed., Lecture Notes in Computer Science,Springer-Verlag. 1, 5

[34] Oriat, C. Jartege: a tool for randomgeneration of unit tests for Java classes.Tech. Rep. RR-1069-I, Centre National

de la Recherche Scientifique, Institut Na-tional Polytechnique de Grenoble, Univer-sitte Joseph Fourier Grenoble I, June 2004.1, 5

[35] De Jong, K. A. (1975). An analysis of thebehavior of a class of genetic adaptive sys-tems (Doctoral dissertation, University ofMichigan). Dissertation Abstracts Interna-tional, 36(10), 5140B. (University Micro lmsNo. 76-9381) 9

[36] Holland, J. H. (1975). Adaptation in naturaland arti cial systems. Ann Arbor: Univer-sity of Michigan Press. 9

[37] Goldberg, D. E. (1989c). Genetic algorithmsin search, optimization, and machine learn-ing. Reading, MA: Addison-Wesley. 13, 14

[38] M. Wall. GAlib: A C++ Library ofGenetic Algorithm Components. MIT,http://lancet.mit.edu/ga/, 1996. 16

[39] I. Ciupa, A. Pretschner, A. Leitner, M.Oriol, and B. Meyer. On the predictabilityof random tests for object-oriented software.In Proceedings of the First InternationalConference on Software Testing, Verifica-tion and Validation (ICST08), April 2008.28, 38

55

evolutionary object-oriented testing - uva · 2020-07-15 · evolutionary object-oriented testing...

Documents