[8] a survey of software testing tools for computational science

Upload: azharhussain

Post on 02-Jun-2018

224 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    1/24

    RAL-TR-2007-010

    A Survey of Software Testing Tools forComputational Science

    L.S. Chin, D.J. Worth, and C. Greenough

    June 29, 2007

    Abstract

    This report presents a summary of information gathered in considering software testing prac-tices for Computational Science and Engineering. It includes an overview of software testing,and provides a survey of tools currently available to assist in implementing testing solutions

    for scientic applications written in Fortran.

    Keywords: software testing, software quality, verication, validation, Fortran

    Email: [email protected], [email protected], or [email protected] can be obtained from www.softeng.cse.clrc.ac.uk

    Software Engineering GroupComputational Science & Engineering DepartmentRutherford Appleton LaboratoryHarwell Science and Innovation Campus

    DidcotOxfordshire OX11 0QX

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    2/24

    c Science and Technology Facilites Council

    Enquires about the copyright, reproduction and requests for additional copies of this report should beaddress to:

    Library and Information Services

    STFC Rutherford Appleton LaboratoryHarwell Science and Innovation CampusDidcotOxfordshire OX11 0QXTel: +44 (0)1235 445384Fax: +44 (0)1235 446403Email: [email protected]

    STFC e-reports are available online at: http://epubs.cclrc.ac.uk

    Neither the Council nor the Laboratory accept any responsibility for loss or damage arising from the use of information contained in any of their reports or in any communication about their tests or investigations

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    3/24

    Contents

    1 Introduction 1

    2 Software Engineering Support Programme 2

    3 Overview of Software Testing 33.1 Stages of Software Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.1.1 Design phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33.1.2 Testing phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.1.3 Maintenance phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.1.4 Implementation phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    3.2 Test Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2.1 The Black Box approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2.2 The White Box approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    3.3 Deciding on a strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    4 Available Tools 84.1 Testing Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    4.1.1 pFUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.1.2 fUnit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.3 DejaGNU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.1.4 QMTest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1.5 Cleanscape Grayboxx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    4.2 Capture and Playback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2.1 AutoExpect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2.2 TestWorks CAPBAK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    4.3 Output validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.3.1 TextTest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3.2 ndiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.3.3 Toldiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    4.3.4 numdiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4 Test Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4.1 gcov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4.2 Polyhedron plusFort - CVRANAL . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.4.3 FCAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.4.4 Cleanscape Grayboxx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.4.5 TestWorks/TCAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.4.6 LDRA Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.4.7 McCabe IQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.5 Test Management and Automation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.5.1 RTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.5.2 TestLink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.5.3 QaTraq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.5.4 AutoTest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.5.5 STAF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    4.6 Build Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.6.1 BuildBot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.6.2 test-AutoBuild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.6.3 Parabuild . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.6.4 CruiseControl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.6.5 BuildForge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.6.6 AEGIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    5 Further reading 20

    References 21

    i

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    4/24

    1 Introduction

    Software has a long history of being used by the scientic community as a vehicle for performing worldclass research. These software are usually written by a variety of developers, and often evolve overtime to incorporate new algorithms, models or features, or are refactored to take advantage of differentprogramming paradigms and cutting-edge technology.

    Test suites that accompany these software play a crucial role in checking that the software functionscorrectly and produces the expected results. A good set of tests serves as a safety net for developers,ensuring that the software remains valid and internally consistent as changes are made. Additionally,these tests allow for independent verication by the end users thus building condence in the software.

    Since most scientic software (and their test suites) are developed by domain experts rather thenSoftware Engineers, there is a tendency for emphasis to be on the represented model or calculation. Testsare therefore designed around checking for acceptable results rather than discovering when or how thesoftware might fail.

    Subsequently, this may lead to a situation where inadequate sets of tests lull developers into a falsestate of condences. A 100% passing rate for a test suite that exercises only 30% of the program codecould easily mislead developers and end-users. Similarly, tests that produce correct results for a smallsubset of input may lead to incorrect assumptions that the results will remain valid for all other input.

    This report documents the rst steps in determining strategies for adopting high-payoff softwaretesting practices within the scientic software development. We look at well established methodologiespracticed by the Software Engineering community, as well as software testing tools that can acceleratethe process of building, running, and managing these solutions. Due to the predominance of Fortranamong scientic software projects, it is difficult for developers to take advantage of many of the availabletesting tools designed mainly for the general software engineering community that has long shied awayfrom Fortran.

    Chapter 3, which draws heavily from the 2004 edition of the text by Myers 1 [2], provides a broadoverview of software testing concepts and methodology used in Software Engineering.

    Chapter 4 presents a survey of software tools that can potentially be used to implement testingsolutions for scientic software. This list is quite extensive, and serves as a starting point for furtherevaluation efforts.

    This report is one of the outputs of the Software Engineering Support Programme (SESP).

    1 The original book, published in 1979, is often regarded as a seminal work on software testing.

    1

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    5/24

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    6/24

    3 Overview of Software Testing

    Software testing involves more than just running a program to see whether it works. A single test runreveals nothing about the program other than the obvious fact that it can yield results for a particularset of inputs. Software testing should be treated as an investigative exercise; one which systematicallyuncovers different classes of errors within the code while demonstrating that the software behaves asexpected.

    The developers concept of the denition and objectives of software testing plays a major role indetermining the efficacy of the activity. It inuences the developers decision on what should be tested,and judgement on what is considered a successful test.

    For example, if the denition Software Testing is a process of proving that a program is bug freewere adopted, there would be a natural tendency for developers to subconsciously write fewer or lessdestructive test cases with lower probabilities of breaking the program. Furthermore, the objectivethat this denition implies is practically impossible to achieve. It takes only one failed test to prove theexistence of bugs, but requires an innite amount of test cases to prove otherwise. Tests can only nddefects, not prove that there are none.

    A similarly delusive denition would be Software Testing is a process of proving that a program performs its intended functions . This line of thinking often leads to test cases that focuses only on

    program behaviour that is inherently expected . However, programs that perform the right functionswhen given a controlled set of inputs are still erroneous if they also produce unwanted side effects orfail when given unexpected inputs. A complete test should check for both expected and unexpectedbehaviours, using valid as well as invalid inputs.

    Myers [2] aptly denes software testing as a process of executing a program with the intention of nding errors . Using the analogy of a medical diagnosis, a successful investigation is one that seeksand discovers a problem, rather than one that reveals nothing and provides a false sense of well-being.Based on this denition, we establish that a good set of test cases should be one that has a high chanceof uncovering previously unknown errors, while a successful test run is one that discovers these errors.

    In order to detect all possible errors within a program, exhaustive testing is required to exerciseall possible input and logical execution paths. Except for very trivial programs, this is economicallyunfeasible if not impossible. Therefore, a practical goal for software testing would be to maximise theprobability of nding errors using a nite number of test cases, performed in minimum time with minimumeffort.

    Section 3.2 presents several test design strategies that can be used to work towards this goal.

    3.1 Stages of Software Testing

    Figure 1 presents an illustration of the different phases of software development with a list of activitiesthat make up each phase. It is an extension of the V-model, and includes an additional Maintain loopto cater for iterative software development models (e.g. evolutionary prototyping, staged delivery, etc.)that may be more relevant to scientic application development.

    The diagram is admittedly over-elaborate as it attempts to be all-encompassing; it is not meant todescribe wholly the development process of a particular software project, but instead provide a corre-lation between different activities that represent the building blocks of software development projects.

    Developers may wish to consider only those activities relevant to their project, and from the diagram,determine where the different software testing stages could be applied within their software developmentprocess.

    3.1.1 Design phase

    The design phase represents a stream of activities where the software specications are dened, startingfrom a high level specication of requirements down to the detailed description of the implementation.

    At each stage, an associated document is produced as well as the test criteria which reect therequirements specied in the document. If it is feasible, for instance in the case of acceptance tests basedon user requirements, the actual test cases should be written at this stage

    Test criteria drawn up at the design phase would be based on an objective view of specications,resulting in a more complete and accurate representation of the requirements.

    3

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    7/24

    Figure 1: Extended V-Model which includes a Maintenance phase

    3.1.2 Testing phase

    The testing phase is made up of the different stages of testing, which reects a bottom-up correspondencewith the levels in which software is designed and built.

    Unit Testing :Testing a code module in isolation, ensuring that it works correctly as specied by the detaileddesign. Good unit tests assist in future refactoring of code, since they give assurance that themodied code still works as expected and can therefore be included into the project.

    Integration Testing :Testing of communication and interaction between different code modules that are to be integrated.Integration tests are dened based on the architectural design of the system, and provide condencethat all modules can work together to achieve the functionalities specied in the design.

    Code Coverage Analysis :Determining the level of coverage for previous tests. If the level does not meet a predened threshold,the test cases should be extended until a satisfactory coverage level is attained. Since the coverageof test cases depends on the actual code implementation, coverage has to be re-evaluated whenevercode changes to ensure that coverage level is maintained. Test coverage will be discussed further insection 3.2.2.

    System Testing :Testing of system level requirements as stipulated by the software requirements specication. Thismight include tests for performance, interoperability, portability, usability, installability, etc.

    Acceptance Testing :Testing of the nal product against user requirements specication. User may refer to actual

    4

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    8/24

    end-users using the program, or in the case of prototypes or novel applications, the developers thatdene what they are attempting to achieve.

    There is a ow leading down from each test level back to the implementation phase. This representsthe fact that failed tests are followed by an implementation of a x, and a re-execution of all tests. Thisform of regression testing attempts to detect any new bugs that might have been introduced when the

    code was modied.While it does seem like a lot of work, there are tools that aid in managing and automating tests. Testmanagement and build management tools not only make it easier to run tests, they also provide otheruseful functionalities such as e-mail notications, report generation, and defect tracking. A list of thesetools are provided in section 4.5 and 4.6.

    3.1.3 Maintenance phase

    The maintenance phase begins the moment the software is released. As feedback and bug reports arereceived, updates to the code are planned, and the required changes reected in the documentation andtest cases. The process then ows back into the Implementation phase followed by the Testing phasebefore the new version of the code is released.

    3.1.4 Implementation phase

    The implementation phase acts the link between all other phases, and represents all activities involved intranslating ideas and design into a working program. Activities that make up this phase are not limitedto the writing of programming code, but must also include other supporting components:

    Unit Tests :Any implementation or modication of code should be follow by a relevant unit test. This ensuresthat any bugs that might be present are detected early, and can be easily traced and xed.

    Change Control :It is not unusual to require changes to the design during the implementation stage. This might bedue to a change in requirements, or an oversight during the design phase. These changes should be

    reected in the design documents as well as the associated tests.Quality Assurance :

    All written code should be put through some form of Quality Assurance (QA) process. This mightinclude code conformance checking, static code analysis, build tests, memory leak detection, or evenpeer review.

    The execution of tests and QA tools can be automated using build management tools listed in section4.6. Using build tools, a list of predened actions can be executed whenever changes to the code aredetected, and notication sent out whenever a problem is found.

    3.2 Test Design

    This section briey discusses two classes of strategies commonly used when designing software test cases.These strategies provide a systematic approach towards creating test cases with higher chances of discovering errors in a program and are oriented around achieving sufficient test coverage the black-boxapproach is designed to attain good input-output data coverage , while the white-box approach focusesinstead on program logic coverage .

    3.2.1 The Black Box approach

    The Black Box method, sometimes referred to as Data-Driven or Functional testing, involves taking anexternal perspective of the program units and ignoring the internal workings. Test cases are dened bysetting up a range of different inputs, and comparing the results of each run against a predened listof expected output. It is important that the expected output are predened beforehand so as to avoiderroneous (but seemingly plausible) results from being interpreted at rst glance as being acceptable.

    Initial test cases are often derived from the software specication, with each of the specied require-ments translated to a set of expected input-output. While this method is useful in exposing unimple-mented parts of the specication, it do not yet provide sufficient coverage of input data.

    5

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    9/24

    To weed out all errors, test cases would have to include combinations of not just expected and valid inputs but all possible input. For example, in the case of a program reading from le a value represent-ing age, a valid input may be a range of positive integers, but possible inputs would also include 0,0.12X Y Z A 123, 0.2e 12, 12637213232, empty strings, character strings, binary data, etc.

    For most programs, this form of exhaustive input testing is not possible as it would involve an almostinnite number of test cases. Instead, a compromise is made by choosing a subset of test cases that hasthe highest probability of detecting the most errors.

    The following are several methodologies used to select an effective subset of test cases (Examples anddetailed discussion on these methodologies are available in [2]).

    Equivalence Partitioning :The input domain is partitioned into a nite number of equivalence classes such that reasonableassumptions can be made that a test of a representative value of each class is equivalent to thatof any other value in its class. Test cases can then be derived by gathering one value from eachpartition.

    Boundary-value analysis :This method complements equivalence partitioning by concentrating on elements at the borders of each class.

    Cause-effect graphing :Test cases are converted from rules in a decision table. This decision table is generated from a cause-effect graph which is a logical representation of the functionality that the program is attempting toattain.

    Error guessing :Test cases are written based on a list of error-prone conditions. Generation of the list relies on anunderstanding of the program implementation as well as the science represented by the program,and thus depends largely on the knowledge, creativity and experience of the developer.

    3.2.2 The White Box approach

    The White Box approach, also known as Logic-Driven or Structural testing, uses an internal perspectiveof the program units where test cases are designed from an examination of the program logic. In itssimplest form, the White Box method can be seen as an iterative design strategy driven by code coverageanalysis 2 .

    Using the White Box method, a testing solution can be designed along the lines of:

    1. Decide on a reasonable coverage target.

    2. Analyse coverage of current test cases. Initial test cases would ideally have been derived using theBlack Box approach.

    3. If total coverage is below the predetermined target, study low coverage segments of the code

    4. Identify and include test cases that can increase coverage

    5. Re-execute tests, and repeat steps 2-4 until target coverage is achieved

    A high coverage level of a test run would indicate that most of the logical paths within the programhave been traversed, which leads to a higher chance of exposing errors within the program. Coveragelevel can therefore be used, to a certain extent, as a measure of the quality of the test cases.

    The following is a non-exhaustive list of different measures of coverage, each of which is increasinglymore complete but harder to achieve. (Examples and detailed discussion on these methodologies areavailable in [2]).

    Statement Coverage :Ensuring that every statement is executed at least once.

    Decision Coverage :

    Ensuring that every branch direction is traversed, and (for subroutines with multiple entry points)that every entry point is invoked at least once.2 Code coverage is a metric which describes the degree to which a program has been exercised, and can be determined

    using dynamic analysis tools. Some of these tools are presented in section 4.4

    6

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    10/24

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    11/24

    4 Available Tools

    This chapter presents a survey of software testing tools currently available for software written in Fortran.We have broken down the list into several categories:

    Testing Framework

    Capture and Playback

    Output Validation

    Test Coverage

    Test Management and Automation

    Build Management

    The description for each tool were adapted from text available in their respective websites.

    4.1 Testing Framework

    Testing frameworks accelerate the testing process by providing developers with tools that assist in thedevelopment and deployment of tests.

    While each of the frameworks employ different approaches, all of them should provide the following:

    Tools and libraries for writing test suites (a collection of test cases),

    Mechanism for setting up and tearing down a testing runtime environment,

    Standardised form or reporting and managing test results.

    4.1.1 pFUnit

    Available from http://sourceforge.net/projects/pfunit/More info at http://opensource.gsfc.nasa.gov/projects/funit/pfunit.php

    License

    NASA Open Source Agreement (NOSA)

    Description

    The goals of the pFUnit project are to provide a shared mechanism for supporting unit testing within theHPC community in the hope of encouraging best practices for development and maintenance of software.In particular, pFUnit aims to be sufficiently minimal to encourage rapid adoption while still providing aminimum threshold of functionality. By providing pFUnit as open source, we hope to leverage interestfrom other groups to enhance portability and usability.

    pFUnit is a Fortran analogue to various other xUnit testing frameworks which have been developedwithin the software community, and is intended to enable test driven development (TDD) within thescientic/technical programming community.

    It was written (almost) entirely in standard conforming Fortran 95, and was developed using TDDmethodology. pFUnit is bundled with an extensive set of self-tests which are intended to evolve alongwith the primary package.

    pFUnit includes scripts which can conveniently wrap user-written tests into test suites and assemblethose suites into an executable. The lack of true object-orientation and reection within Fortran neces-sitates this approach. Nonetheless, once added to a developer build system adding/running additionaltests requires minimal effort. The executable itself is, at least for the moment, command-line driven.If all tests pass, then a simple summary of the number of tests run is returned. If some tests failed, asummary of which tests failed and any associated messages is returned to standard output.

    Features that will be of particular interest to the developer of scientic applications include:

    Extensive sets of assert routines for oating point, including support for single and double precision,multidimensional arrays, and various means of expressing tolerances.

    8

    http://opensource.gsfc.nasa.gov/projects/funit/pfunit.phphttp://sourceforge.net/projects/pfunit/
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    12/24

    Ability to launch MPI tests and report results back as a single test - an essential feature for high-endcomputing associated with weather and climate modelling.

    Ability to repeat tests across a complex high-dimensional parameter space. The need for thiscapability arises when multiple input parameters strongly interact within a subsystem. The abilityto balance performance concerns against the need to adequately sample the possibilities is very

    useful. Failing parameter tests report back which combinations of parameters resulted in failures.

    4.1.2 fUnit

    Available from http://funit.rubyforge.org/

    License

    NASA Open Source Agreement (NOSA)

    Description

    FUnit is a unit testing framework for Fortran modules.

    Unit tests are written in Fortran fragments that use a small set of testing-specic keywords andfunctions. FUnit transforms these fragments into valid Fortran code and compiles, links, and runs themagainst the module under test.

    FUnit is opinionated software which values convention over conguration. Specically, fUnit requiresa Fortran 95 compiler, it only supports testing routines contained in modules, it requires tests to bestored along side the code under test, and it requires that you follow a specic naming rule for test les.

    The requirements for using fUnit are :

    A Fortran 90/95/2003 compiler (set via FC environment variable)

    The Ruby language with the RubyGems package manager

    4.1.3 DejaGNU

    Available from http://www.gnu.org/software/dejagnu/

    License

    GNU General Public License (GPL)

    Description

    DejaGnu is a framework for testing other programs. Its purpose is to provide a single front end for alltests. Think of it as a custom library of Tcl procedures crafted to support writing a test harness. A TestHarness is the testing infrastructure that is created to support a specic program or tool. Each programcan have multiple testsuites, all supported by a single test harness. DejaGnu is written in Expect, whichin turn uses Tcl (Tool command language).

    DejaGnu offers several advantages for testing: The exibility and consistency of the DejaGnu framework make it easy to write tests for any

    program.

    DejaGnu provides a layer of abstraction which allows you to write tests that are portable to anyhost or target where a program must be tested. For instance, a test for GDB can run (from anyUnix based host) on any target architecture that DejaGnu supports. Currently DejaGnu runs testson several single board computers, whose operating software ranges from just a boot monitor to afull-edged, Unix-like realtime OS.

    All tests have the same output format. This makes it easy to integrate testing into other softwaredevelopment processes.

    Using Tcl and Expect , its easy to create wrappers for existing testsuites. By incorporating existingtests under DejaGnu, its easier to have a single set of report analyse programs..

    9

    http://www.gnu.org/software/dejagnu/http://funit.rubyforge.org/
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    13/24

    DejaGnu is written in Expect , which in turn uses Tcl (Tool command language).Running tests requires two things: the testing framework and the testsuites themselves. Tests are

    usually written in Expect using Tcl , but you can also use a Tcl script to run a testsuite that is not basedon Expect .

    4.1.4 QMTest

    Available from http://www.codesourcery.com/qmtest/

    License

    GNU General Public License (GPL)

    Description

    QMTest is a cost-effective general purpose testing solution that can be used to implement a robust, easy-to-use testing process. QMTest runs on Windows and on most UNIX-like operating systems includingGNU/Linux.

    QMTests extensible architecture allows it to handle a wide range of application domains: everything

    from compilers to graphical user interfaces to web-based applications. QMTest can easily compare testresults to known-good baselines, making analysing test results far simpler. And, because QMTest runson virtually all operating systems, you can use it with your entire product line.

    4.1.5 Cleanscape Grayboxx

    Vendor

    Cleanscape Software Internationalhttp://www.cleanscape.net/products/grayboxx/index.html

    Description

    A complete software life-cycle testing toolset developed for software written in C, Fortran, Ada, andAssembly. Grayboxx provides a complete software testing solution that veries functional and structuralperformance requirements for mission critical applications. Grayboxx automatically conducts the follow-ing test methodologies: Blackbox Testing, Whitebox Testing, Regression Testing, Assertion Testing, andMutation Testing.

    Grayboxx speeds the development process by allowing developers and test engineers to automatically:

    Generate test cases

    Conduct coverage analysis with complexity metrics

    Conduct unit performance testing with no probe insertions

    Generate test stubs

    Generate test harnesses

    Execute tests

    Prepare modules

    Verify results

    Grayboxx also allows for both full and partial regression testing, allowing the tester to run the sametest more than once or to name the test titles to run with a subset of test cases.

    10

    http://www.cleanscape.net/products/grayboxx/index.htmlhttp://www.codesourcery.com/qmtest/
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    14/24

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    15/24

  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    16/24

    Description

    Toldiff is a diff tool that allows tolerable (insignicant) differences between two les to be suppressedshowing only the important ones. The tolerable differences are recorded running the tool with an appro-priate command line ag.

    4.3.4 numdiff Available from: http://www.nongnu.org/numdiff/

    License

    GNU General Public License (GPL)

    Description

    Numdiff is a little program that can be used to compare putatively similar les line by line and eldby eld, ignoring small numeric differences or/and different numeric formats. Equivalently, Numdiff isa program with the capability to appropriately compare les containing numerical elds (and not only).By default, Numdiff assumes the elds are separated by white spaces (blanks, horizontal tabulations andnewlines), but the user can also specify its list of separators.

    When you compare a couple of such les, what you want to obtain usually is a list of the numericalelds in the second le which numerically differ from the corresponding elds in the rst le. Well knowntools like diff, cmp or wdiff can not be used to this purpose: they can not recognise whether a differencebetween two numerical elds is only due to the notation or is actually a difference of numerical values.Moreover, you could also want to ignore differences in numerical values as long as they do not overcomea certain threshold. In other words, you could desire to neglect all small numerical differences too.

    4.4 Test Coverage

    Code coverage of test suites can be determined using dynamic analysis tools, some of which are listedin this section. This information is useful when determining the thoroughness of the test cases, and are

    often used when designing test cases using the White Box method (see section 3.2.2).Using code coverage tools, developers can determine:

    Cold spots Parts of the code that are never used, or just not used by the test cases.

    Hot spots Parts of the code that are used frequently.

    New test cases To exercise a part of the code not already tested.

    This kind of testing is really a test of the completeness of the test cases, i.e. do they exercise all partsof the code but it also gives indirect testing of the code itself. If all cases had the same cold spot(s) thenmaybe that code can be removed, or if there is a common hot spot then this is an area to study in detailto nd ways of making it more efficient.

    4.4.1 gcovAvailable from: http://sourceforge.net/project/showfiles.php?group_id=3382

    License

    GNU General Public License (GPL)

    Description

    gcov is a test coverage program. Use it in concert with GNU CC to analyse your programs to help createmore efficient, faster running code. You can use gcov as a proling tool to help discover where youroptimisation efforts will best affect your code. You can also use gcov along with the other proling tool,gprof, to assess which parts of your code use the greatest amount of computing time.

    Proling tools help you analyse your codes performance. Using a proler such as gcov or gprof, youcan nd out some basic performance statistics, such as:

    how often each line of code executes

    13

    http://sourceforge.net/project/showfiles.php?group_id=3382http://www.nongnu.org/numdiff/
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    17/24

    what lines of code are actually executed

    how much computing time each section of code uses

    Once you know these things about how your code works when compiled, you can look at each module tosee which modules should be optimized. gcov helps you determine where to work on optimization.

    Software developers also use coverage testing in concert with testsuites, to make sure software isactually good enough for a release. Testsuites can verify that a program works as expected; a coverageprogram tests to see how much of the program is exercised by the testsuite. Developers can then determinewhat kinds of test cases need to be added to the testsuites to create both better testing and a better nalproduct.

    Output from gcov can be visualised using tools such as ggcov ( http://ggcov.sourceforge.net/ )and lcov ( http://ltp.sourceforge.net/coverage/lcov.php ).

    4.4.2 Polyhedron plusFort - CVRANAL

    Vendor

    Polyhedron Softwarehttp://www.polyhedron.co.uk/pf/pfqa.html#coverage

    Description

    The plusFort package includes CVRANAL, a coverage analysis facility that places probes into Fortransource code which allow users to monitor the effectiveness of testing. At the end of each run, the probesupdate the coverage statistics for each source le. This data may be analysed at any time using theCVRANAL tool. CVRANAL identies untested code blocks, and execution hot-spots.

    In addition, CVRANAL can annotate your source code as shown below. The annotations are com-ments and do not affect the validity of the source code.

    4.4.3 FCAT

    Available from http://www.dl.ac.uk/TCSC/UKHEC/FCAT/

    Description

    FACT is similar to CRVANAL in that it reports the execution count for each line of executable sourcecode.

    FCAT (FORTRAN Coverage Analysis Tool) is used for the Coverage Analysis of FORTRAN codes.

    nding out cold-spot in Fortran codes (the part of the codes that are never executed), and agsthese parts line-by-line.

    nding out hot-spot in Fortran codes (the part of the codes that are most frequently executed),and gives a line by line prole.

    It is designed to working mainly with F90/F95, even through it also works with xed formattedFORTRAN, thus F77.FCAT offers some facility for the coverage analysis of parallel codes. It treats a line as being executed

    if at least one processor has executed it. The counter for the line is taken as the maximum of the numberof times this line has been executed over all processors.

    4.4.4 Cleanscape Grayboxx

    Vendor

    Cleanscape Software Internationalhttp://www.cleanscape.net/products/grayboxx/index.html

    DescriptionA complete software life-cycle testing toolset that includes coverage analysis with complexity metrics. Itcan perform the following coverage functions

    14

    http://www.cleanscape.net/products/grayboxx/index.htmlhttp://www.dl.ac.uk/TCSC/UKHEC/FCAT/http://www.polyhedron.co.uk/pf/pfqa.html#coveragehttp://ltp.sourceforge.net/coverage/lcov.phphttp://ggcov.sourceforge.net/
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    18/24

    Measure test effectiveness and reliability of testing by analysing application source code

    Set up test cases and measures their efficiency

    Consolidate results of test coverage measurements for several scenarios or during a test campaign

    Enable effective visualisation of covered and uncovered source code

    4.4.5 TestWorks/TCAT

    Vendor

    Software Research, Inc.http://www.soft.com/Products/stwindex.html

    Description

    TCAT and S-TCAT, a branch-level unit-test and system test coverage analysis tool, provides branch andcall-pair coverage for F77 and Ada programs. TCAT and S-TCAT measure the number of times eachsegment or function-call pair is exercised.

    C1 expresses test effectiveness as the percentage of every segment exercised in a program by a testsuite, relative to the number of such segments existing in the system.S1 expresses test effectiveness as the percentage of every function-call exercised in a program by a test

    suite, relative to the number of such function-calls existing in the system.TCAT and S-TCAT instrument the application by placing markers at each segment or function-call.

    When test cases have been run against the instrumented code, TCAT/S-TCAT collects and stores testcoverage data in a tracele. TCAT/S-TCAT then extracts this information to create coverage reportsindicating which calls remain untested or frequently tested, and which test cases duplicate coverage.TCAT/S-TCAT also creates archive les containing cumulative test information.

    The instrumentation process also generates call-trees that identify a programs modules and representthe caller-callee hierarchical structure (as well as subtrees of caller-callee dependencies) within a program.

    Using optional user annotation and or supplied colour annotation, the call-tree shows each functionslevel of interface exercise. When a function-call coverage values are low the user can navigate directly tothe corresponding source code. The call-trees can also generate directed graphs depicting the control-owstructure for individual modules.

    4.4.6 LDRA Testbed

    Vendor

    LDRAhttp://www.ldra.co.uk/testbed.asp

    Description

    LDRA Testbeds Dynamic Analysis tool provides coverage analysis at the following levels:

    Statement Coverage

    Branch/Decision Coverage

    LCSAJ Coverage

    MC/DC Coverage

    Dynamic Data Flow Coverage

    4.4.7 McCabe IQ

    Vendor

    McCabe Softwarehttp://www.mccabe.com/iq_test.htm

    15

    http://www.mccabe.com/iq_test.htmhttp://www.ldra.co.uk/testbed.asphttp://www.soft.com/Products/stwindex.html
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    19/24

    Description

    McCabe IQ provides comprehensive test / code coverage to focus, monitor, and document software testingprocesses. Using industry-standard testing methods and advanced dynamic analysis techniques, McCabeIQ accurately assesses the thoroughness of your testing and aids in gauging the time and resources neededto ensure a well-tested application.

    McCabe IQ provides multiple levels of test coverage at the unit, integration, regression test phasesincluding module, lines of code, branch, path, Boolean (MC/DC for DO-178B test verication), data,class (OO), and architectural coverages.

    4.5 Test Management and Automation

    As the number of test cases for each project grows, it becomes increasingly important for these tests tobe organised and automated as much as possible. Most test management and automation tools wouldprovide the following functionalities:

    Organisation of information such as software requirements, test plans, and test cases.

    Test results tracking.

    Automated execution of tests. Test runs can be periodic, or triggered by events such as changes tothe source tree.

    Reports and statistics generation.

    4.5.1 RTH

    Available from: http://www.rth-is-quality.com

    License

    GNU General Public License (GPL)

    Descriptionrth is a web-based tool designed to manage requirements, tests, test results, and defects throughout theapplication life cycle. The tool provides a structured approach to software testing and increases the vis-ibility of the testing process by creating a common repository for all test assets including requirements,test cases, test plans, and test results. Regardless of their geographic location, rth allows testers, devel-opers, business analysts, and managers to monitor and gauge application readiness. The tool includesmodules for requirements management, test planning, test execution, defect tracking, and reporting.

    Benets of RTH include:

    Working in remote locations is no longer a problem. View the status of your project on the web

    View progress of requirements, test execution, and bug status in real-time

    All documents (requirements, tests, test plans, supporting documents) are stored under versioncontrol

    Store record or le-based requirements based on your reporting needs

    Test Tool agnostic! Take advantage of test automation with three simple functions that allow youto write automated test results to rth

    Post and report on both manual and automated test results

    4.5.2 TestLink

    Available from: http://testlink.org

    License

    GNU General Public License (GPL)

    16

    http://testlink.org/http://www.rth-is-quality.com/
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    20/24

    Description

    TestLink is a open source web based Test Management and test Execution system. which allow qualityassurance teams to create and manage their test cases as well as organise them into test plans. Thesetest plans allow team members to execute test cases and track test results dynamically, generate reports,trace software requirements, prioritise and assign.

    The tool is based on PHP, MySQL, and includes several other open source tools. It also supports Bugtracking systems as is Bugzilla or Mantis.In short, TestLink allow users to:

    Collect and organise test cases dynamically

    Track results and metrics associated with test execution

    Track specic information about individual tests

    Capture and report details to assist in conducting a more thorough testing process

    Customise TestLink to t requirements and processes

    4.5.3 QaTraqAvailable from: http://www.testmanagement.com

    License

    GNU General Public License (GPL) with options for commercial upgrades.Professional upgrades include additional modules for extended graphical reporting capabilities and

    extensible scripting functionalities.

    Vendor

    Traq Software Ltd.

    Description

    QaTraq Test Case Management Tool allows you to consolidate the manual and functional software testingprocess. With one functional software testing management tool we give you the control to automate yourown techniques and strategies to track your testing, from the planning stages right through to the testcompletion reporting. From communicating your test plans to managing the functional coverage of yoursoftware testing QaTraq can help you gain control of the whole manual and functional software testingprocess without changing your own strategies or techniques.

    Amongst other things QAtraq can provide you with:

    Improved co-ordination between testers, team leaders and managers

    A repository of your entire manual testing progress

    A knowledge base of technical testing to share amongst a test team

    A formal channel for developers and testers to suggest tests

    Accurate tracking of your functional software testing

    Instant reports based on test cases created and executed

    Statistics listing the testing which is most effective

    Control of your Manual and functional software testing.

    4.5.4 AutoTest

    Available from http://eiffelzone.com/esd/tstudio/

    17

    http://eiffelzone.com/esd/tstudio/http://www.testmanagement.com/
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    21/24

    License

    Eiffel Forum License, version 2http://www.opensource.org/licenses/ver2_eiffel.php

    Description

    AutoTest (formerly TestStudio) is a fully automatic testing tool based on Design by Contract.Contracts are a valuable source of information regarding the intended semantics of the software. The

    information that contracts (preconditions, postconditions, class invariants, loop variants and invariants,and check instructions) provide can be used to check whether the software fulls its intended purpose.By checking that the software respects its contracts, we can ascertain its validity. Therefore, contractsprovide the basis for automation of the testing process.

    AutoTest allows the user to generate, compile and run tests at the push of a button.

    4.5.5 STAF

    Available from: http://staf.sourceforge.net

    LicenseCommon Public License (CPL) V1.0.http://www.opensource.org/licenses/cpl1.0.php

    Description

    The Software Testing Automation Framework (STAF) is an open source, multi-platform, multi-languageframework designed around the idea of reusable components, called services (such as process invocation,resource management, logging, and monitoring). STAF removes the tedium of building an automationinfrastructure, thus enabling you to focus on building your automation solution. The STAF frameworkprovides the foundation upon which to build higher level solutions, and provides a pluggable approachsupported across a large variety of platforms and languages.

    STAF can be leveraged to help solve common industry problems, such as more frequent productcycles, less preparation time, reduced testing time, more platform choices, more programming languagechoices, and increased National Language requirements. STAF can help in these areas since it is a provenand mature technology, promotes automation and reuse, has broad platform and language support, andprovides a common infrastructure across teams.

    STAX is an execution engine which can help you thoroughly automate the distribution, execution,and results analysis of your testcases. STAX builds on top of three existing technologies, STAF, XML,and Python, to place great automation power in the hands of testers. STAX also provides a powerful GUImonitoring application which allows you to interact with and monitor the progress of your jobs. Someof the main features of STAX are: support for parallel execution, user-dened granularity of executioncontrol, support for nested testcases, the ability to control the length of execution time, the ability toimport modules at run-time, support for existing Python and Java modules and packages, and the abilityto extend both the STAX language as well as the GUI monitoring application. Using these capabilities,you can build sophisticated scripts to automate your entire test environment, while ensuring maximumefficiency and control.

    Other STAF services are also provided to help you to create an end-to-end automation solution. Byusing these services in your test cases and automated solutions, you can develop more robust, dynamictest cases and test environments.

    4.6 Build Management

    Build management systems automate the update-compile-test cycle of a software project. Changes tothe source code would trigger a rebuild of the application and cause the tests to be executed. This allowsdevelopers to get immediate feedback if an error occurs, and ensures that problems are detected as earlyas possible.

    Other tasks, such as software quality assurance analysis, can be included within the list of actions tobe performed. This ensures that the updated source code is not only correct, but is also of good qualityand adheres to standards dened for the project.

    18

    http://www.opensource.org/licenses/cpl1.0.phphttp://staf.sourceforge.net/http://www.opensource.org/licenses/ver2_eiffel.php
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    22/24

    4.6.1 BuildBot

    Available from: http://buildbot.sourceforge.net

    License

    GNU General Public License (GPL)

    Description

    The BuildBot is a system to automate the compile/test cycle required by most software projects to validatecode changes. By automatically rebuilding and testing the tree each time something has changed, buildproblems are pinpointed quickly, before other developers are inconvenienced by the failure. The guiltydeveloper can be identied and harassed without human intervention. By running the builds on a varietyof platforms, developers who do not have the facilities to test their changes everywhere before checkinwill at least know shortly afterwards whether they have broken the build or not. Warning counts, lintchecks, image size, compile time, and other build parameters can be tracked over time, are more visible,and are therefore easier to improve.

    The overall goal is to reduce tree breakage and provide a platform to run tests or code-quality checks

    that are too annoying or pedantic for any human to waste their time with. Developers get immediate(and potentially public) feedback about their changes, encouraging them to be more careful about testingbefore checkin.

    4.6.2 test-AutoBuild

    Available from: http://www.autobuild.org/

    License

    GNU General Public License (GPL)

    Description

    Test-AutoBuild is a framework for performing continuous, unattended, automated software builds. Theidea of Test-AutoBuild is to automate the building of a projects complete software stack on a pristinesystem from the high level applications, through the libraries and right down to the smallest part of thetoolchain.

    4.6.3 Parabuild

    Vendor

    Viewtier Systemshttp://www.viewtier.com/products/parabuild/

    Description

    Parabuild is a software build management server that helps software teams and organisations reduce risksof project failures by providing practically unbreakable daily builds and continuous integration builds.

    Parabuild features an effortless installation process and easy overall use, multi-platform remote builds,fast Web user interface, a wide set of supported version control, and issue tracking systems.

    4.6.4 CruiseControl

    Available from: http://cruisecontrol.sourceforge.net

    License

    BSD-style License

    http://www.opensource.org/licenses/bsd-license.php

    19

    http://www.opensource.org/licenses/bsd-license.phphttp://cruisecontrol.sourceforge.net/http://www.viewtier.com/products/parabuild/http://www.autobuild.org/http://buildbot.sourceforge.net/
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    23/24

    Description

    CruiseControl is composed of 2 main modules:

    the build loop: core of the system, it triggers build cycles then noties various listeners (users)using various publishing techniques. The trigger can be internal (scheduled or upon changes in aSCM) or external. It is congured in a xml le which maps the build cycles to certain tasks, thanksto a system of plugins. Depending on conguration, it may produce build artefacts.

    the reporting allows the users to browse the results of the builds and access the artefacts

    4.6.5 BuildForge

    Vendor

    Recently acquired by IBM and incorporated into IBMs Rational Software suite.http://www.buildforge.com

    Description

    IBM Rational Build Forge provides complete build and release process management through an openframework that helps development teams standardise and automate tasks and share information. Ourproducts can help clients accelerate software delivery, improve software quality, as well as meet audit andcompliance mandates.

    4.6.6 AEGIS

    Available from: http://aegis.sourceforge.net/

    License

    GNU General Public License (GPL)

    DescriptionAegis is a transaction-based software conguration management system. It provides a framework withinwhich a team of developers may work on many changes to a program independently, and Aegis co-ordinates integrating these changes back into the master source of the program, with as little disruptionas possible.

    While Aegis is not a build management system, we included it within this section as it can be usedto the same effect of ensuring that code passes tests (build, testing, QA) before being merged into mainsource tree.

    5 Further reading

    1. B. Kleb & B. Wood, Computational Simulations and the Scientic Method , NASA Langley ResearchCenter, (2005).

    2. A.H. Watson & T.J. McCabe, Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric , National Institute of Standards and Technology, (1996).

    3. G. Dodig-Crnkovic, Scientic Methods in Computer Science , Department of Computer Science,Mlardalen University, (2002)

    4. E. Dustin, Effective Software Testing: 50 Specic ways to improve your testing , Addison WesleyProfessionals, (2002).

    5. B. Beizer, Software Testing Techniques , Van Nostrand Reinhold Co., (1990).

    6. M. Fewster and D. Graham, Software Test Automation: Effective use of test execution tools ,Addison-Wesley, (1999).

    7. S.M. Baxter, S.W. Day, J.S. Fetrow & S.J. Reisinger, Scientic Software Development Is Not an Oxymoron , PLoS Comput Biol 2(9): e87, (2006).

    20

    http://aegis.sourceforge.net/http://www.buildforge.com/
  • 8/10/2019 [8] a Survey of Software Testing Tools for Computational Science

    24/24

    8. D. Libes, How to Avoid Learning Expect or Automating Automating Interactive Programs , Proceed-ings of the Tenth USENIX System Administration Conference (LISA X), (1996).

    9. S. Cornett, Code Coverage Analysis , Bullseye Testing Technology. http://www.bullseye.com/coverage.html

    10. W.R. Bush, J.D. Pincus & D.J. Sielaff, A Static Analyzer for Finding Dynamic Programming Errors

    ,Intrinsa Corporation, (2000).

    References

    [1] D.J. Worth & C. Greenough, A Survey of Software Tools for Computational Science , Technical ReportRAL-TR-2006-011, CCLRC Rutherford Appleton Laboratory (2006).

    [2] G.J. Myers, The Art of Software Testing, 2nd Edition , John Wiley & Sons inc., (2004).

    http://www.bullseye.com/coverage.htmlhttp://www.bullseye.com/coverage.html