how good are your testers? an assessment of testing ability liang huang, chris thomson and mike...

How good are your testers? An assessment of testing ability

Liang Huang, Chris Thomson and Mike Holcombe

Department of Computer Science, University of Sheffield

• Background

• The Experiment

• Preliminary results

• Conclusion and Future research

Background• Test First programming

Story

Implementation

Write Tests

Run test cases

All pass?

Rework

No

Yes

Next Story

Story

Write Tests

Implementation

Run test cases

All pass?No

Rework

Next Story

Yes

Test Last Test First

How Test Last and Test First work respectively

Background

• Previous Studies (TF vs TL)

TF programmers obtained higher productivity1) Kaufmann et al [Kaufmann 2003]2) Janzen et al [Janzen 2005]

TF programmers failed to obtain higher productivity1) Müller et al [Müller 2002]2) Williams [Williams 2003] et al and Maximilien et al [Maximilien 2003]3) George et al [George 2003, 2004]4) Macias et al [Macias 2004]5) Erdogmus [Erdogmus 2005]

Background

• Previous Studies (TF vs. TL)

TF programmers obtained higher external qualityWilliams [Williams 2003] et al and Maximilien et al [Maximilien 2003]

George et al [George 2003, 2004]

Edwards [Edwards 2003]

TF programmers failed to obtain higher external qualityMüller et al [Müller 2001]

Pancur et al [Pancur 2003]

Macias et al [Macias 2004]

Erdogmus [Erdogmus 2005]

BackgroundOur Initial study

Results (pertaining to the effectiveness):1) TF teams spent more percentage of time on testing

2) TF teams obtained higher productivity however statistically insignificant

3) The minimum external quality achievable was improved with the increase of time spent on testing as a percentage

4) Linear correlation between Effort spent on Testing and Coding

Background

• Motivation The differences in terms of effectiveness between TF and TL programmers are

possibly due to some co-variances other than the treatments (testing/programming strategies).

1) TF is not easy to learn [Crispin 2006].

2) Subjects are not skillful of programming following TF.

3) Testing has an impact on the Code quality and productivity [Basili 1986, Stephens 2003].

It is imperative to analyze the tests written by subjects and to assess the subjects’ ability to test, to distinguish the good and bad testers.

• Background

• The Experiment



The Experiment

Context: Sheffield Software Engineering ObservatorySemi-industrial setting.

Medium-sized projects,

Longer development time,

Real external clients

2 groups of subjects 2nd and 3rd year computer science undergraduates.

4 th year MEng and MSc students.

The Experiment

• Questionnaire ASubjects were given1) A short piece of Java code, and2) 29 potential tests

and asked to select tests for1) Category partition testing (22 out of 29 were necessary for the partition),

and2) Giving Branch coverage (The coverage and redundant choices were

calculated for each of the responses).

The testing ability was measured by 1) For Category partitioning: (The number of Correct choices made) -– (the number of redundant choices)2) Branch coverage obtained, redundant choices for giving branch coverage

The Experiment

• Procedure

1) Team and group allocation

2) Intensive training of doing TF

3) Software development, including group meetings, management meetings, and client meetings

4) Questionnaire distribution (before Easter vocation)

• Background

• The Experiment



Preliminary results

• Undergraduates achieved lower marks in doing Category partitioning whereas made more redundant choices when giving the branch coverage, however NOT statistically significant.

• Postgraduates did no better than undergraduates when giving the branch coverage.

Preliminary results

• The postgraduates had higher probability to be Excellent (38% versus 21% for undergraduates), and the much lower probability to be the Poor (13% versus 43% for undergraduates), given that the responses were categorized by “Excellent” (70% and above), “Fair” (50%-70%) and “Poor” (50% and below)

• Background

• The Experiment



Conclusion and Future research

• Limitation

1) Student subjects,

2) Small sample size,

3) Low response rate

4) The ability to select tests, not write test

5) Code based questionnaire only


• ConclusionSince category partition method requires some analysis of the specification, and

TF requires programmers to write tests before code

Programmers with higher level of expertise did better when doing category partition, while

failed to do better in the case of giving branch coverage,

which suggestsTF requires higher level of expertise.


• Future ResearchQuestionnaires 1) which is NOT code based, and/or 2) in which testing of different level is focusedare to be distributed in a larger group of subjects with different backgrounds.

Questionnaire B (proposed)Subjects were proposed to be given1) A short piece of text specification, and2) A number of potential tests

The testing ability was proposed to be measured by 1) The number of Correct choices made2) The number of redundant choices

Thanks for listening

how good are your testers? an assessment of testing ability liang huang, chris thomson and mike...

Documents