how good are your testers? an assessment of testing ability liang huang, chris thomson and mike...
TRANSCRIPT
How good are your testers? An assessment of testing ability
Liang Huang, Chris Thomson and Mike Holcombe
Department of Computer Science, University of Sheffield
• Background
• The Experiment
• Preliminary results
• Conclusion and Future research
• Background
• The Experiment
• Preliminary results
• Conclusion and Future research
Background• Test First programming
Story
Implementation
Write Tests
Run test cases
All pass?
Rework
No
Yes
Next Story
Story
Write Tests
Implementation
Run test cases
All pass?No
Rework
Next Story
Yes
Test Last Test First
How Test Last and Test First work respectively
Background
• Previous Studies (TF vs TL)
TF programmers obtained higher productivity1) Kaufmann et al [Kaufmann 2003]2) Janzen et al [Janzen 2005]
TF programmers failed to obtain higher productivity1) Müller et al [Müller 2002]2) Williams [Williams 2003] et al and Maximilien et al [Maximilien 2003]3) George et al [George 2003, 2004]4) Macias et al [Macias 2004]5) Erdogmus [Erdogmus 2005]
Background
• Previous Studies (TF vs. TL)
TF programmers obtained higher external qualityWilliams [Williams 2003] et al and Maximilien et al [Maximilien 2003]
George et al [George 2003, 2004]
Edwards [Edwards 2003]
TF programmers failed to obtain higher external qualityMüller et al [Müller 2001]
Pancur et al [Pancur 2003]
Macias et al [Macias 2004]
Erdogmus [Erdogmus 2005]
BackgroundOur Initial study
Results (pertaining to the effectiveness):1) TF teams spent more percentage of time on testing
2) TF teams obtained higher productivity however statistically insignificant
3) The minimum external quality achievable was improved with the increase of time spent on testing as a percentage
4) Linear correlation between Effort spent on Testing and Coding
Background
• Motivation The differences in terms of effectiveness between TF and TL programmers are
possibly due to some co-variances other than the treatments (testing/programming strategies).
1) TF is not easy to learn [Crispin 2006].
2) Subjects are not skillful of programming following TF.
3) Testing has an impact on the Code quality and productivity [Basili 1986, Stephens 2003].
It is imperative to analyze the tests written by subjects and to assess the subjects’ ability to test, to distinguish the good and bad testers.
• Background
• The Experiment
• Preliminary results
• Conclusion and Future research
The Experiment
Context: Sheffield Software Engineering ObservatorySemi-industrial setting.
Medium-sized projects,
Longer development time,
Real external clients
2 groups of subjects 2nd and 3rd year computer science undergraduates.
4 th year MEng and MSc students.
The Experiment
• Questionnaire ASubjects were given1) A short piece of Java code, and2) 29 potential tests
and asked to select tests for1) Category partition testing (22 out of 29 were necessary for the partition),
and2) Giving Branch coverage (The coverage and redundant choices were
calculated for each of the responses).
The testing ability was measured by 1) For Category partitioning: (The number of Correct choices made) -– (the number of redundant choices)2) Branch coverage obtained, redundant choices for giving branch coverage
The Experiment
• Procedure
1) Team and group allocation
2) Intensive training of doing TF
3) Software development, including group meetings, management meetings, and client meetings
4) Questionnaire distribution (before Easter vocation)
• Background
• The Experiment
• Preliminary results
• Conclusion and Future research
Preliminary results
• Undergraduates achieved lower marks in doing Category partitioning whereas made more redundant choices when giving the branch coverage, however NOT statistically significant.
• Postgraduates did no better than undergraduates when giving the branch coverage.
Preliminary results
• The postgraduates had higher probability to be Excellent (38% versus 21% for undergraduates), and the much lower probability to be the Poor (13% versus 43% for undergraduates), given that the responses were categorized by “Excellent” (70% and above), “Fair” (50%-70%) and “Poor” (50% and below)
• Background
• The Experiment
• Preliminary results
• Conclusion and Future research
Conclusion and Future research
• Limitation
1) Student subjects,
2) Small sample size,
3) Low response rate
4) The ability to select tests, not write test
5) Code based questionnaire only
Conclusion and Future research
• ConclusionSince category partition method requires some analysis of the specification, and
TF requires programmers to write tests before code
Programmers with higher level of expertise did better when doing category partition, while
failed to do better in the case of giving branch coverage,
which suggestsTF requires higher level of expertise.
Conclusion and Future research
• Future ResearchQuestionnaires 1) which is NOT code based, and/or 2) in which testing of different level is focusedare to be distributed in a larger group of subjects with different backgrounds.
Questionnaire B (proposed)Subjects were proposed to be given1) A short piece of text specification, and2) A number of potential tests
The testing ability was proposed to be measured by 1) The number of Correct choices made2) The number of redundant choices
Thanks for listening