an empirical study of test case filtering techniques based on exercising information flows

Post on 03-Feb-2016

17 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows. IEEE Transactions on Software Engineering Wes Masri, Andy Podgurski, David Leon Presented by Jason R. Beck and Enrique G. Ortiz. Outline. Introduction and background definitions Paper Objectives - PowerPoint PPT Presentation

TRANSCRIPT

IEEE Transactions on Software EngineeringWes Masri, Andy Podgurski, David Leon

Presented by Jason R. Beck and Enrique G. Ortiz

Introduction and background definitions Paper Objectives Filtering Techniques Profile Types and Tools Empirical Study Description Subject Programs Results Conclusion Pros - Cons / Suggestions

Information Flow◦ Important concept in software testing research◦ Describes complex interactions between different

program elements.

Software failures◦ Often caused by untested information flows.◦ Why?

Information flows can be complex Too many to make testing them all feasible.

Test Case Filtering◦ Involves selecting a manageable number of test

cases to use.

Software Profiles◦ Software profiles are recorded interactions during

program operation.◦ Can describe control flow, data flow, input or

variable values, object states, event sequences, and timing.

◦ Profiles can analyzed for how likely they are to generate errors and those can be tested further.

1. Reduce the number of test cases to be executed.

2. Reduce the number of test executions which need a manually interpretation of correct output.

◦ Anything that requires a human interpretation of results as part of the test involves much effort.

◦ Can be eliminated if test cases are automated and self validating.

Presents the results of an empirical study using many test case filtering techniques.

Evaluates techniques for their ability to reveal defects in programs.

Information profiles created using author developed tool.

Generally graph theory models showing information flow in the software.

Many proposed techniques Authors focus on …

1. Information flow between objects. Data driven

2. Dynamic program slicing. Program statement driven (think stack trace when

debugging).

Both have static and runtime versions.

Two techniques compared Each driven by execution profiles which

indicate execution frequency of program elements.

1.Coverage Based-Techniques2.Distribution-Based Techniques

“Select test cases to maximize the proportion of program elements of a given type”◦ Attempts to cover as many elements of the

program as possible with the fewest number of test cases.

◦ Instance of a set-cover problem.

Algorithm◦ Each iteration selects a test case which covers the

largest number of program elements not covered by the previously selected tests.

Clustering technique◦ Test cases are clustered and a test case from each

cluster can be selected to represent the group.◦ Created by observing execution profiles as patterns with

n dimensions. ◦ Each dimension represents the execution count of a basic

block of code.

Also uses failure-pursuit sampling ◦ Audits test cases near failures using a k-nearest neighbor

approach. ◦ This allows cases similar to the errors to be checked.

Profiles characterize test executions by keeping track of execution frequencies of program elements.

The study takes into account eight types of profiles.◦ Generated using Byte Code Engineering Library to

examine the byte code of Java programs.◦ It also uses an existing tool the authors created

for dynamic information flow analysis.

Method Calls (MC) ◦ contains a count of how many times a method M

was called.

Method Call Pairs (MCP)◦ a count of how many times a method M1 called a

method M2.

Basic Blocks (BB)◦ A count of how many times a given basic block of

code was executed.

Basic Block Edges (BBE)◦ A count of how many times a basic block B1

branches to basic block B2.

Def-use pairs (DUP)◦ A count of how many times a variable definition is

defined and then later used.

All of the above combined (ALL)◦ Combination of all the above models.

More complex profile types

Information flow pairs (IFP)◦ Count of how many times a variable x flowed into

variable y.

Slice Pairs (SliceP)◦ For each statement pair s1 and s2, s1 occurs

before s2 in at least one slice.

Basic Coverage Maximization Cluster Filtering (One-per cluster sampling) Failure-Pursuit Sampling Simple Random Sampling

Empirical StudyEmpirical Study

Ties◦ “different tests that each covers the maximal

number of program elements not covered by previously selected tests”

Ran 1,000 times per program/profile type Randomly selected order of the tests Recorded

◦ Number of tests selected◦ How many failures and defects detected

Basic Coverage Basic Coverage MaximizationMaximization

Proportional Binary Metric and Agglomerative Hierarchical Clustering

Number of clusters varied to correspond to a range of percentages of the size of the test suite

Procedure1. Clustered into c clusters based on their profiles2. One test randomly selected from each cluster3. Recorded number of failures and defects

Run 1,000 times Failure Pursuit: Check 5 nearest neighbors

Cluster Filtering and Failure Cluster Filtering and Failure Pursuit SamplingPursuit Sampling

Randomly select test without replacement Record number of failure-inducing tests and

defects Ran 1,000 Times

Simple Random SamplingSimple Random Sampling

Subject Programs and Test Subject Programs and Test SuitesSuites

28,639 lines of code Jacks Test Suite

◦ 3,140 tests◦ 233 cause failures

javac javac Java CompilerJava Compiler

52,528 lines of code XML Conformance Test Suite

◦ Used 1,667 tests of 2,000 Difficult to determine pass/fail of dropped tests

◦ 10 cause failures◦ Only checks syntax

Xerces Xerces XML parserXML parser

Test compliance with Java Language Specification

1,000 files (tests) from Google Groups◦ Failed on 47 of test cases

TidyHTML Syntax CheckerTidyHTML Syntax Checker

Defects that caused errors were traced Results:

◦ Average percentage of defects that they revealed over a number of replicated applications viewed as a function of the number of tests selected

◦ Compared with respect to how often they reveled specific defects

AnalysisAnalysis

Basic Coverage Maximization Results

Basic Coverage Maximization Results

Several defects revealed in 1,000 replications

Some defects only revealed when SliceP and IFP maximized

“Maximization with one type of profile revealed defects that were not revealed with another type of profile that seems to be more detailed.”

ResultsResults

Simpler profile types (i.e. MC, MCP, BB, BBE, and DUP) revealed more defects than IFP

“Information Flow Pairs are recorded only when a variable is actually defined (assigned a value), but some defects may be triggered without executing such an operation.”

AnomaliesAnomalies

Distribution Based Filtering Distribution Based Filtering ResultsResults

Distribution Based Filtering Distribution Based Filtering ResultsResults

Distribution Based Filtering Distribution Based Filtering ResultsResults

Distribution Based Filtering Distribution Based Filtering ResultsResults

Distribution Based Filtering Distribution Based Filtering ResultsResults

Distribution Based Filtering Distribution Based Filtering ResultsResults

Programs too broad Did not debug programs enough Wrongly classified defects Assumes size of the final set of tests is an

accurate measure of cost

Threats to ValidityThreats to Validity

Time and space increases with level of profile detail

Time for collecting profile information, longer than time needed for analysis

Observations Cost and Observations Cost and AnalysisAnalysis

Coverage maximization, One-Per-Cluster Sampling, and Failure Pursuit Sampling more effective than Random Sampling when proportion of failure high

Coverage maximization based on complex profiles revealed most defects

ConclusionsConclusions

One-per-cluster sampling and failure pursuit did not clearly perform better than coverage maximization

No clear performance difference between one-per-cluster and failure pursuit sampling

ConclusionsConclusions

Empirically evaluate test case filtering techniques

Compare with respect to:◦ Effectiveness for revealing defects◦ Simple Random Sampling

Complex profiles such as IFP and SliceP justifiable for when large number of tests necessary

ConclusionsConclusions

Pros◦ Describes a good way to analyze programs.

◦ Uses profiles to help minimize complexity for only those most meaningful code chunks.

Cons◦ Programs tested were just compilers and syntax

checkers. ◦ Graphs could have better captions explaining

what is occuring0

Have only one Test Suite◦ Several different program types that can be

tested with same suite◦ Eliminates an additional variable

Select several types of programs

SuggestionsSuggestions

top related