the impact of test case summaries on bug fixing performance: an empirical investigation
TRANSCRIPT
The Impact of Test Case Summaries on Bug Fixing Performance:
An Empirical Investigation
Sebastiano Panichella
Annibale Panichella
Moritz Beller
Andy Zaidam
Harald Gall
Why?
@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");
Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
@Test public void test1() throws Throwable {
Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
2
Class Name: Option.java Library: Apache Commons-Cli
@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");
Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
@Test public void test1() throws Throwable {
Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
Class Name: Option.java Library: Apache Commons-Cli
Why?
3
Q1: What are the main differences?
Q2: Do they cover different parts of the code?
@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");
Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
@Test public void test1() throws Throwable {
Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
Class Name: Option.java Library: Apache Commons-Cli
4
Why?
Q1: What are the main differences?
Q2: Do they cover different parts of the code?
@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");
Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
@Test public void test1() throws Throwable {
Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
Class Name: Option.java Library: Apache Commons-Cli
5
CandidateAssertions
Why?
Q1: What are the main differences?
Q2: Do they cover different parts of the code?
@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");
Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
@Test public void test1() throws Throwable {
Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
Class Name: Option.java Library: Apache Commons-Cli
6
Q3: Are these assertions correct?
Why?
Q1: What are the main differences?
Q2: Do they cover different parts of the code?
@Test public void test0() throws Throwable { Option option0 = new Option("aaabbb", true, "aaabbb");
Option option1 = new Option("aaabbb", true, "aaabbb");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
@Test public void test1() throws Throwable {
Option option0 = new Option("aaabbb", true, "aaabbb");Option option1 = new Option("aaabbb", true, "aaabbb");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
7
Test Code Comprehension
Generated Tests
Production Codepublic class Options implements Serializable{ private static final long serialVersionUID = 1L;
/** a map of the options with the character key */ private Map shortOpts = new HashMap();
/** a map of the options with the long key */ private Map longOpts = new HashMap();
/** a map of the required options */ private List requiredOpts = new ArrayList();
/** a map of the option groups */
Earl T. Barr, et al., “The Oracle Problem in Software Testing: A Survey”.IEEE Transactions on Software Engineering, 2015.
Are Generated Tests Helpful?
G. Fraser et al., Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study,
TOSEM 2015.
Do not lead to detection of more faults.
8
0%
TestingComprehension
Testing time
75% 100%
Test Coverage Analysis
COBERTURA
Test Suite GenerationOption.java
TestDescriber
@Testpublic void testProva() throws Throwable {
Option option0 = new Option("aaa", true, "aaa");Option option1 = new Option("aaa", true, "aaa");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
@Testpublic void testProva2() throws Throwable {
Option option0 = new Option("aaa", true, "aaa");Option option1 = new Option("aaa", true, "aaa");option0.setLongOpt("adafv");option1.setLongOpt("adafv");boolean boolean0 = option1.equals((Object) option0);assertEquals("arg", option1.getArgName());assertTrue(option0.hasArg());assertTrue(boolean0);
}
Summary Generation
10
Summary Generator
Software Words Usage Model: deriving <actions>, <themes>, and <secondary arguments> from class, methods, attributes and variable identifiers
E. Hill et al. Automatically capturing source code context of NL-queries for software maintenance and reuse. ICSE 2009
11
Summary Generator
public class Option {
public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException {
OptionValidator.validateOption(opt);this.opt = opt;this.longOpt = longOpt;
if (hasArg) {this.numberOfArgs = 1;
}
this.description = descr;}
... }
SWUM in TestDescriber:
Covered Code
12
public class Option {
public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException {
OptionValidator.validateOption(opt);this.opt = opt;this.longOpt = longOpt;
if (hasArg) { //FALSEthis.numberOfArgs = 1;
}
this.description = descr;}
... }
Summary Generator
SWUM in TestDescriber:
1) Select the covered statements
Covered Code
13
public class Option {
public Option(String opt, String longOpt, boolean hasArg, String descr) throws IllegalArgumentException {
OptionValidator.validateOption(opt);this opt = opt;this longOpt = longOpt;
if (hasArg) {false
}
this description = descr;}
... }
SWUM in TestDescriber:
1) Select the covered statements
2) Filter out Java keywords, etc.
Summary Generator
Covered Code
14
public class Option {
public Option(String opt, String long Opt, boolean has Arg, String descr) throws IllegalArgumentException {
Option Validator.validate Option(opt);this opt = opt;this long Opt = long Opt;
if (has Arg) {false;
}
this description = descr;}
... }
SWUM in TestDescriber:
1) Select the covered statements
2) Filter out Java keywords, etc.
3) Identifier Splitting (Camel case)
Summary Generator
Covered Code
15
public class Option {
public Option(String option, String long Option, boolean has Argument String description) throws IllegalArgumentException {
Option Validator.validate Option(option);this option = option;this long Option = long Option;
if (has Argument) {false
}
this description = description;}
... }
SWUM in TestDescriber:
1) Select the covered statements
2) Filter out Java keywords, etc.
3) Identifier Splitting (Camel case)
4) Abbreviation Expansion (using external vocabularies)
Summary Generator
Covered Code
16
SWUM in TestDescriber:
1) Select the covered statements
2) Filter out Java keywords, etc.
3) Identifier Splitting (Camel case)
4) Abbreviation Expansion (using external vocabularies)
5) Part-of-Speech tagger
Summary Generator
<actions> = Verbs <themes> = Nouns/Subjects <secondary arguments> = Nouns / objectes, adjectives, etc
public class Option {Option(String option, String long Option
, boolean has Argument String description) throws IllegalArgumentException
Option Validator.validate Option(option);
this option = option;
this long Option = long Option;
if (has Argument false}this description = description;
}
NOUN NOUN NOUNADJ
NOUNNOUNVERB
NOUN NOUN NOUN
NOUN
VERB NOUN
NOUNADJ
ADJ ADJ ADJ
NOUN
NOUN NOUN
VERB
ADJ
NOUN
CON
NOUN
ADJ
Covered Code
17
Summary Generator
NOUN NOUN NOUNADJ
NOUNNOUNVERB
NOUN NOUN NOUN
NOUN
VERB NOUN
NOUNADJ
ADJ ADJ ADJ
NOUN
NOUN NOUN
VERB
ADJ
NOUN
CON
NOUN
The test case instantiates an "Option" with:- option equal to “...”- long option equal to “...”- it has no argument- description equal to “…”
An option validator validates it
The test exercises the following condition:- "Option" has no argument
public class Option {Option(String option, String long Option
, boolean has Argument String description) throws IllegalArgumentException
Option Validator.validate Option(option);
this option = option;
this long Option = long Option;
if (has Argument false}this description = description;
}
NOUN NOUN NOUNADJ
NOUNNOUNVERB
NOUN NOUN NOUN
NOUN
VERB NOUN
NOUNADJ
ADJ ADJ ADJ
NOUN
NOUN NOUN
VERB
ADJ
NOUN
CON
NOUN
ADJ
Natural Language Sentences Parsed Code
18
The test case instantiates an "Option" with:- option equal to “...”- long option equal to “...”- it has no argument- description equal to “…”
An option validator validates it
The test exercises the following condition:- "Option" has no argument
Natural Language Sentences
19
Class Level
Method LevelStatement
Level
Branch Level
Summarisation Levels
Summarisation Levels
The test case instantiates an "Option" with:- option equal to “...”- long option equal to “...”- it has no argument- description equal to “…”
An option validator validates it
The test exercises the following condition:- "Option" has no argument
Natural Language Sentences
20
Class Level
Method LevelStatement
Level
Branch Level
Do Test Summaries Improve Test Readability?
Do Test Summaries Help Developers?
ContextObject: two Java classes from Apache Commons Primitives and Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015]
Subjects: 30 Developers
ArrayIntList.javaRational.java
22
Subjects: 30 Developers (23 Researchers and 7 Developers)
ContextObject: two Java classes from Apache Commons Primitives and Math4J that have been used in previous studies on search-based software testing [by Fraser et al. TOSEM 2015]
ArrayIntList.javaRational.java
23
Bug Fixing Tasks
Group 1 Group 2
ArrayIntList.javaRational.java ArrayIntList.javaRational.java
Comments Comments
TestDescriber
28
Bug Fixing Tasks
Experiment conducted Offline via a Survey platform
Each participant received the experiment package consisting of: 1. A pretest questionnaire 2. Instructions and materials to perform the experiment 3. A post-test questionnaire
We do not revealed the goal of the study
45 minutes of time for each task
29
RQ1: How do test case summaries impact the number of bugs fixed by developers?
Participants WITHOUT TestDescriber summaries fixed 40% of injected bugsNone of them was able to fix all bugs.
32
RQ1: How do test case summaries impact the number of bugs fixed by developers?
Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs.
33
Participants WITHOUT TestDescriber summaries fixed 40% of injected bugsNone of them was able to fix all bugs.
RQ1: How do test case summaries impact the number of bugs fixed by developers?
With summaries, the participants were able to fix twice as many number of bugs (+50%,+100%), in the same
time window (45 minutes).
The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE
34
Participants, WITH TestDescriber summaries, fixed 60%-80% of injected bugs 31% of them fixed all the bugs.
Participants WITHOUT TestDescriber summaries fixed 40% of injected bugsNone of them was able to fix all bugs.
RQ1: How do test case summaries impact the number of bugs fixed by developers?
Results are not influenced by developers’ experience:
(i) the number of bugs fixed is not significantly influenced by the programming experience;
(ii)there is no significant interaction between the programming experience and the presence of test case summaries.
35
The differences are statistically significant (Wilcoxon test with p-value<0.05) A12 Effect Size is always LARGE
RQ1: How do test case summaries impact the number of bugs fixed by developers?
Results are not influenced by developers’ experience:
(i) the number of bugs fixed is not significantly influenced by the programming experience;
(ii) there is no significant interaction between the programming experience and the presence of test case summaries.
Summary: Using automatically generated test case summaries significantly helps developers to
identify and fix more bugs.
36
How do test case summaries impact developers to change test cases in terms of
structural and mutation coverage?
RQ2
ArrayIntList.javaRational.java
RQ2: How do test case summaries impact developers to change test cases in terms of structural and mutation coverage?
38
ArrayIntList.javaRational.java
RQ2: How do test case summaries impact developers to change test cases in terms of structural and mutation coverage?
ONLY for Rational there is an improvements of the mutation score (+10%) when tests are
enriched with summaries.
10%
39
ArrayIntList.javaRational.java
RQ2: How do test case summaries impact developers to change test cases in terms of structural and mutation coverage?
ONLY for Rational there is an improvements of the mutation score (+10%) when tests are
enriched with summaries.
10%Summary: Test case summaries do not influence how the developers manage the test cases in
terms of structural coverage.
40
Test Cases Summaries and Comprehension
Without
With 4%
6%
14%
33%
14%
6%
32%
9%
36%
45%
Medium High Very High Low Very Low
Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries
41
Test Cases Summaries and Comprehension
WITH Summaries:
(i) 46% of participants consider the test cases as “easy to understand”.
(iii) Only 18% of participants considered the test cases
as incomprehensible.
Without
With 4%
6%
14%
33%
14%
6%
32%
9%
36%
45%
Medium High Very High Low Very Low
Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries
42
Test Cases Summaries and Comprehension
WITHOUT Summaries:
(i) Only 15% of participants consider the test cases as
“easy to understand”.
(iii) 40% of participants considered the test cases
as incomprehensible.
WITH Summaries:
(i) 46% of participants consider the test cases as “easy to understand”.
(iii) Only 18% of participants considered the test cases
as incomprehensible.
Without
With 4%
6%
14%
33%
14%
6%
32%
9%
36%
45%
Medium High Very High Low Very Low
Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries
43
Without
With 4%
6%
14%
33%
14%
6%
32%
9%
36%
45%
Medium High Very High Low Very Low
Perceived test comprehensibility WITH and WITHOUT TestDescriber summaries
Test Cases Summaries and Comprehension
WITHOUT Summaries:
(i) Only 15% of participants consider the test cases as
“easy to understand”.
(iii) 40% of participants considered the test cases
as incomprehensible.
WITH Summaries:
(i) 46% of participants consider the test cases as “easy to understand”.
(iii) Only 18% of participants considered the test cases
as incomprehensible.Summary: Test summaries statistically improve the comprehensibility of automatically generated
test case according to human judgments.
44
Quality of TestDescriber’ Summaries
Expressiveness
30%
70%
Is easy to read and understand
Is somewhat readable and understandable
Is hard to read and understand
Conciseness
10%
52%
38%
Has no unnecessary information
Has some unnecessary information
Has a lot of unnecessary information
Content adequacy
13%
37%50%
Is not missing any information
Missing some information
Missing some very important information
45
Quality of TestDescriber’ Summaries
Expressiveness
30%
70%
Is easy to read and understand
Is somewhat readable and understandable
Is hard to read and understand
Conciseness
10%
52%
38%
Has no unnecessary information
Has some unnecessary information
Has a lot of unnecessary information
Content adequacy
13%
37%50%
Is not missing any information
Missing some information
Missing some very important information
46
Quality of TestDescriber’ Summaries
Expressiveness
30%
70%
Is easy to read and understand
Is somewhat readable and understandable
Is hard to read and understand
Conciseness
10%
52%
38%
Has no unnecessary information
Has some unnecessary information
Has a lot of unnecessary information
Content adequacy
13%
37%50%
Is not missing any information
Missing some information
Missing some very important information
47
Conclusion
1) Using automatically generated test case summaries significantly helps
developers to identify and fix more bugs.
2) Test case summaries do not influence how the developers manage the test
cases in terms of structural coverage.
3) Test summaries statistically improve the comprehensibility of automatically
generated test case according to human judgments.
Panichella et al. “The Impact of Test Case Summaries on Bug Fixing Performance: An Empirical Investigation”. ICSE 2016 48