automated unit test generation during software development · software development josé miguel...
TRANSCRIPT
AUTOMATED UNIT TEST GENERATION DURING
SOFTWARE DEVELOPMENT
José Miguel Rojas [email protected]
Joint work with Gordon Fraser and Andrea Arcuri
A Controlled Experiment and Think-aloud Observations
ISSTA 2015
“Testing is a widespread validation approach in industry, but it is still largely ad hoc,
expensive, and unpredictably effective.”“Software Testing Research: Achievements, Challenges, Dreams,”
A. Bertolino. Future of Software Engineering. IEEE . 2007.
“Testing is a widespread validation approach in industry, but it is still largely ad hoc,
expensive, and unpredictably effective.”
“Test case generation has a strong impact on the effectiveness and efficiency of testing.”
“…one of the most active research topics in software testing for several decades, resulting in many different approaches and tools.”
“Software Testing Research: Achievements, Challenges, Dreams,” A. Bertolino. Future of Software Engineering. IEEE . 2007.
”An orchestrated survey of methodologies for automated software test case generation,” S. Anand, E. K. Burke, T. Y. Chen, J. Clark, M.B. Cohen, W. Grieskamp, M.
Harman, M.J. Harrold, P. McMinn. J. Systems and Software. Elsevier. 2013.
BACK IN ISSTA 2013…
“Does automated white-box test generation really help software testers?,” G. Fraser, M. Staats, P. McMinn, A. Arcuri and F. Padberg
BACK IN ISSTA 2013…
“Does automated white-box test generation really help software testers?,” G. Fraser, M. Staats, P. McMinn, A. Arcuri and F. Padberg
ARE UNIT TEST GENERATION TOOLS HELPFUL TO DEVELOPERS WHILE THEY ARE CODING?
CODE COVERAGE
CODE COVERAGE
TIME SPENT ONTESTING
CODE COVERAGE
TIME SPENT ONTESTING
IMPLEMENTATION QUALITY
CONTROLLED EXPERIMENT
CONTROLLED EXPERIMENT
Golden Implementation and Test Suite
CONTROLLED EXPERIMENT
Class TemplateGolden Implementation and Test Suite
CONTROLLED EXPERIMENT
Class Template Implementation and Test Suite
Golden Implementation and Test Suite
CONTROLLED EXPERIMENT
Class Template Implementation and Test Suite
Golden Implementation and Test Suite
CONTROLLED EXPERIMENT
Class Template Implementation and Test Suite
Manual
Golden Implementation and Test Suite
CONTROLLED EXPERIMENT
1 hourClass Template Implementation and Test Suite
Manual
Golden Implementation and Test Suite
CONTROLLED EXPERIMENT
1 hourClass Template Implementation and Test Suite
41
Manual
Golden Implementation and Test Suite
CONTROLLED EXPERIMENT
1 hourClass Template Implementation and Test Suite
41
Manual
Golden Implementation and Test Suite
CONTROLLED EXPERIMENT
1 hourClass Template Implementation and Test Suite
41 2
Manual
Golden Implementation and Test Suite
CONTROLLED EXPERIMENT
1 hourClass Template Implementation and Test Suite
41 42
Manual
Golden Implementation and Test Suite
DOES USING EVOSUITE DURING SOFTWARE DEVELOPMENT LEAD TO TEST SUITES WITH HIGHER CODE COVERAGE?
RQ 1
Bran
ch C
over
age
0%
20%
40%
60%
80%
100%
FilterIterator FixedOrderComparator ListPopulation PredicatedMap
50%
26%
57%
39% 41%
83%
38%
63%
Assisted Manual
CODE COVERAGEparticipants’ test suites run on their own implementations
CODE COVERAGETi
mes
Cov
erag
e w
as c
heck
ed
0
2
4
6
8
10
Category AxisFilterIterator FixedOrderComparator ListPopulation PredicatedMap
6.4
4
9.6
65.3
5.9
1.9
9
Assisted Manual
Times coverage was checked
ListPopulation
Bran
ch C
over
age
(%)
0%
25%
50%
75%
100%
Time (min)0 10 20 30 40 50 60
ManualEvoSuiteAssisted
CODE COVERAGEparticipant’s test suites run on their own implementations
ListPopulation
Bran
ch C
over
age
(%)
0%
25%
50%
75%
100%
Time (min)0 10 20 30 40 50 60
ManualEvoSuiteAssisted
CODE COVERAGEparticipant’s test suites run on their own implementations
ListPopulation
Bran
ch C
over
age
(%)
0%
25%
50%
75%
100%
Time (min)0 10 20 30 40 50 60
ManualEvoSuiteAssisted
CODE COVERAGEparticipant’s test suites run on their own implementations
ListPopulation
Bran
ch C
over
age
(%)
0%
25%
50%
75%
100%
Time (min)0 10 20 30 40 50 60
ManualEvoSuiteAssisted
CODE COVERAGEparticipant’s test suites run on their own implementations
Bran
ch C
over
age
0%
20%
40%
60%
80%
100%
FilterIterator FixedOrderComparator ListPopulation PredicatedMap
50%
28%21%
30%
42%37%35%
41%
Assisted Manual
CODE COVERAGEparticipants’ test suites run on golden implementations
FilterIterator
Bran
ch C
over
age
0%25%50%75%
100%
Time (min)0 10 20 30 40 50 60
Assisted Manual EvoSuite-generated
CODE COVERAGEparticipant’s test suites run on golden implementations, over time
ListPopulation
Bran
ch C
over
age
0%25%50%75%
100%
Time (min)0 10 20 30 40 50 60
FixedOrderComparator
Bran
ch C
over
age
0%25%50%75%
100%
Time (min)0 10 20 30 40 50 60
PredicatedMapBr
anch
Cov
erag
e
0%25%50%75%
100%
Time (min)0 10 20 30 40 50 60
FilterIterator
Bran
ch C
over
age
0%25%50%75%
100%
Time (min)0 10 20 30 40 50 60
Assisted Manual EvoSuite-generated
CODE COVERAGEparticipant’s test suites run on golden implementations, over time
ListPopulation
Bran
ch C
over
age
0%25%50%75%
100%
Time (min)0 10 20 30 40 50 60
FixedOrderComparator
Bran
ch C
over
age
0%25%50%75%
100%
Time (min)0 10 20 30 40 50 60
PredicatedMapBr
anch
Cov
erag
e
0%25%50%75%
100%
Time (min)0 10 20 30 40 50 60
Coverage can be higher when using EvoSuite, depending on how the generated tests are used.
DOES USING EVOSUITE DURING SOFTWARE DEVELOPMENT LEAD TO
DEVELOPERS SPENDING MORE OR LESS TIME ON TESTING?
RQ 2
TESTING EFFORT
0
3
6
8
11
14
FilterIterator FixedOrderComparator ListPopulation PredicatedMap
5.6
11
12.913.7
4.4
8.27.27.8
Assisted Manual
Number of test runs
TESTING EFFORT
0
5
10
16
21
26
FilterIterator FixedOrderComparator ListPopulation PredicatedMap
14.3
25
15.8
20
7.7
12.6
9.3
18.5
Assisted Manual
Minutes spent on testing
TESTING EFFORT
0
5
10
16
21
26
FilterIterator FixedOrderComparator ListPopulation PredicatedMap
14.3
25
15.8
20
7.7
12.6
9.3
18.5
Assisted Manual
Minutes spent on testing
Using EvoSuite reduces the time spent on testing.
DOES USING EVOSUITE DURING SOFTWARE DEVELOPMENT LEAD TO
SOFTWARE WITH FEWER BUGS?
RQ 3
IMPLEMENTATION QUALITYGolden test suites run on participants’ implementations
Num
ber o
f Fail
ures
+Err
ors
0
3
6
10
13
16
FilterIterator FixedOrderComparator ListPopulation PredicatedMap
14.3
5.34.3
6.3
15.6
6.4
4.26.1
Assisted Manual
IMPLEMENTATION QUALITYGolden test suites run on participants’ implementations
Num
ber o
f Fail
ures
+Err
ors
0
3
6
10
13
16
FilterIterator FixedOrderComparator ListPopulation PredicatedMap
14.3
5.34.3
6.3
15.6
6.4
4.26.1
Assisted Manual
Using EvoSuite during development did not lead to to better implementations.
DOES SPENDING MORE TIME WITH EVOSUITE AND ITS TESTS LEAD TO
BETTER IMPLEMENTATIONS?
RQ 4
PRODUCTIVITYTime spent with EvoSuite
Corr
elatio
n w
ith
num
ber o
f fail
ures
plus
err
ors
-0.50
-0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
0.40
FilterIterator FixedOrderComparator ListPopulation PredicatedMap
-0.490.02
-0.22-0.29
0.350.32
-0.03
0.35
Number of runsTime spent on tests
PRODUCTIVITYTime spent with EvoSuite
Corr
elatio
n w
ith
num
ber o
f fail
ures
plus
err
ors
-0.50
-0.40
-0.30
-0.20
-0.10
0.00
0.10
0.20
0.30
0.40
FilterIterator FixedOrderComparator ListPopulation PredicatedMap
-0.490.02
-0.22-0.29
0.350.32
-0.03
0.35
Number of runsTime spent on tests
Implementation quality improves the more timedevelopers spend with EvoSuite-generated tests.
Using automated unit test generation does impact developers’ productivity,
Using automated unit test generation does impact developers’ productivity,
but…
…how to make the most outof unit test generation tools?
Using automated unit test generation does impact developers’ productivity,
but…
THINK ALOUD OBSERVATIONS
J. Hughes and S. Parkes, “Trends in the use of verbal protocol analysis in software engineering research,” Behaviour and Information Technology, vol. 22, no. 2, pp. 127–140, 2003.
K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, 1993.
THINK ALOUD OBSERVATIONS
J. Hughes and S. Parkes, “Trends in the use of verbal protocol analysis in software engineering research,” Behaviour and Information Technology, vol. 22, no. 2, pp. 127–140, 2003.
K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, 1993.
Subject
THINK ALOUD OBSERVATIONS
J. Hughes and S. Parkes, “Trends in the use of verbal protocol analysis in software engineering research,” Behaviour and Information Technology, vol. 22, no. 2, pp. 127–140, 2003.
K. A. Ericsson and H. A. Simon, Protocol Analysis: Verbal Reports as Data (revised edition). MIT Press, 1993.
Observer
Subject
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
THINK ALOUD OBSERVATIONS
2 hoursClass Template Implementation and Test Suite
Golden Implementation and Test Suite
5 41
RESULTS
Images: http://jozef89.deviantart.com/
3 6 2 2 5
15 20 23 7 10
94% | 92% 97% | 72% 94% | 95% 100% | 94% 100% | 83%
FixedOrderComparator ListPopulationFilterIterator PredicatedMap PredicatedMap
RESULTS
Images: http://jozef89.deviantart.com/
3 6 2 2 5
15 20 23 7 10
94% 97% 94% | 95% 100% 100%
FixedOrderComparator ListPopulationFilterIterator PredicatedMap PredicatedMap
RESULTS
Images: http://jozef89.deviantart.com/
3 6 2 2 5
15 20 23 7 10
94% 97% | 72% 94% 100% 100%
FixedOrderComparator ListPopulationFilterIterator PredicatedMap PredicatedMap
LESSONS LEARNED
LESSONS LEARNED• There are different approaches to testing and test generation tools
should be adaptable to them
LESSONS LEARNED• There are different approaches to testing and test generation tools
should be adaptable to them
• Developers’ behaviour is often not driven by code coverage
LESSONS LEARNED• There are different approaches to testing and test generation tools
should be adaptable to them
• Developers’ behaviour is often not driven by code coverage
• Readability of generated unit tests is paramount
LESSONS LEARNED• There are different approaches to testing and test generation tools
should be adaptable to them
• Developers’ behaviour is often not driven by code coverage
• Readability of generated unit tests is paramount
• Integration into development environments must be improved
LESSONS LEARNED• There are different approaches to testing and test generation tools
should be adaptable to them
• Developers’ behaviour is often not driven by code coverage
• Readability of generated unit tests is paramount
• Integration into development environments must be improved
• Education/Best practices: Developers do not know how tobest use automated test generation tools!
“Coverage is easy to assess because it isa number, while readability is a very non-
tangible property…
“Coverage is easy to assess because it isa number, while readability is a very non-
tangible property…
… What is readable to me may not bereadable to you. It is readable to me justbecause I spent the last hour and a half
doing this.”
“Coverage is easy to assess because it isa number, while readability is a very non-
tangible property…
… What is readable to me may not bereadable to you. It is readable to me justbecause I spent the last hour and a half
doing this.”—Participant 5
www.evosuite.org/study-2014/