test vs. inspection part 2 tor stålhane. testing and inspection a short data analysis

42
Test vs. inspection Part 2 Tor Stålhane

Upload: felix-bradford

Post on 02-Jan-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

Test vs. inspectionPart 2

Tor Stålhane

Testing and inspectionA short data analysis

Test and inspections – some terms

First we need to understand two important terms – defect types and triggers.

After this we will look at inspection data and test data from three activity types, organized according to type of defect and trigger.

We need the defect categories to compare test and inspections – where is what best?

Defect categoriesThis presentation uses eight defect categories:• Wrong or missing assignment• Wrong or missing data validation• Error in algorithm – no design change is necessary• Wrong timing or sequencing• Interface problems• Functional error – design change is needed• Build, package or merge problem• Documentation problem

Triggers

We will use different triggers for test and inspections. In addition – white box and black box tests will use different triggers.

We will get back to triggers and black box / white box testing later in the course.

Inspection triggers• Design conformance• Understanding details

– Operation and semantics– Side effects– Concurrency

• Backward compatibility – earlier versions of this system

• Lateral compatibility – other, similar systems• Rare situations• Document consistency and completeness• Language dependencies

Test triggers – black box

• Test coverage• Sequencing – two code chunks in sequence• Interaction – two code chunks in parallel• Data variation – variations over a simple test

case• Side effects – unanticipated effects of a simple

test case

Test triggers – white box

• Simple path coverage• Combinational path coverage – same path

covered several times but with different inputs

• Side effect - unanticipated effects of a simple path coverage

Testing and inspection – the V model

Inspection data

We will look at inspection data from three development activities:

• High level design: architectural design • Low level design: design of subsystems,

components – modules – and data models• Implementation: realization, writing code

This is the left hand side of the V-model

Test data We will look at test data from three

development activities:• Unit testing: testing a small unit like a method

or a class• Function verification testing: functional testing

of a component, a system or a subsystem• System verification testing: testing the total

system, including hardware and users. This is the right hand side of the V-model

What did we find

The next tables will, for each of the assigned development activities, show the following information:

• Development activity• The three most efficient triggers

First for inspection and then for testing

Inspection – defect typesActivity Defect type Percentage

High level designDocumentation 45.10Function 24.71Interface 14.12

Low level designAlgorithm 20.72Function 21.17Documentation 20.27

Code inspectionAlgorithm 21.62Documentation 17.42Function 15.92

Inspection – triggers Activity Trigger Percentage

High level design

Understand details 34.51

Document consistency 20.78

Backward compatible 19.61

Low level design

Side effects 29.73

Operation semantics 28.38

Backward compatible 12.16

Code inspection

Operation semantics 55.86

Document consistency 12.01

Design conformance 11.41

Testing – triggers and defects

Activity Defect type Percentage

Implementation testing

Interface 39.13Assignments 17.79Build / Package / Merge

14.62

Activity Trigger Percentage

Implementation testing

Test sequencing 41.90

Test coverage 33.20

Side effects 11.07

Some observations – 1 • Pareto’s rule will apply in most cases – both

for defect types and triggers• Defects related to documentation and

functions taken together are the most commonly found defect types in inspection– HLD: 69.81%– LLD: 41.44%– Code: 33.34%

Some observations – 2 • The only defect type that is among the top

three both for testing and inspection is “Interface”– Inspection - HLD: 14.12%– Testing: 39.13%

• The only trigger that is among the top three both for testing and inspection is “Side effects”– Inspection – LLD: 29.73– Testing: 11.07

Summary

Testing and inspection are different activities. By and large, they

• Need different triggers• Use different mind sets• Find different types of defects

Thus, we need both activities in order to get a high quality product

Inspection as a social process

Inspection as a social process

Inspections is a people-intensive process. Thus, we cannot consider only technical details – we also need to consider how people

• Interact• Cooperate

Data sources

We will base our discuss on data from two experiments:

• UNSW – three experiments with 200 students. Focus was on process gain versus process loss.

• NTNU – two experiments – NTNU 1 with 20 students. Group size and the use

of checklists.– NTNU 2 with 40 students. Detection probabilities

for different defect types.

The UNSW dataThe programs inspected were • 150 lines long with 19 seeded defects • 350 lines long with seeded 38 defects1. Each student inspected the code individually and

turned in an inspection report.2. The students were randomly assigned to one out of

40 groups – three persons per group. 3. Each group inspected the code together and

turned in a group inspection report.

Gain and loss - 1

In order to discuss process gain and process loss, we need two terms:

• Nominal group (NG) – a group of persons that will later participate in a real group but are currently working alone.

• Real group (RG) – a group of people in direct communication, working together.

Gain and loss -2

The next diagram show the distribution of the difference NG – RG. Note that the

• Process loss can be as large as 7 defects• Process gain can be as large as 5 defects

Thus, there are large opportunities and large dangers.

Gain and loss - 3

0

2

4

6

8

10

12

7 6 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6

Exp 1

Exp 2

Exp 3

Gain and loss - 4

If we pool the data from all experiments, we find that the probability for:

• Process loss is 53 %• Process gain is 30 %

Thus, if we must choose, it is better to drop the group part of the inspection process.

Reporting probability - 1

0,00

0,10

0,20

0,30

0,40

0,50

0,60

0,70

0,80

0,90

1,00

NG = 0 NG = ! NG = 2 NG > 2

RG 1

RG 2

RG 3

Reporting probability - 2

It is a 10% probability of reporting a defect even if nobody found it during their preparations.

It is a 80 % to 95% probability of reporting a defect that is found by everybody in the nominal group during preparations.

Reporting probability - 3

The table and diagram opens up for two possible interpretations:

• We have a, possibly silent, voting process. The majority decides what is reported from the group and what is not.

• The defect reporting process is controlled by group pressure. If nobody else have found it, it is hard for a single person to get it included in the final report.

A closer look - 1

The next diagram shows that when we have • Process loss, we find few new defects during

the meeting but remove many • Process gain, we find, many new defects

during the meeting but remove just a few• Process stability, we find and remove roughly

the same amount during the meeting.

New, retained and removed defects

0

5

10

15

20

25

30

35

40

45

50

New Retained Removed

RG > NG

RG = NG

RG < NG

A closer look - 2

It seems that groups can be split according to the following characteristics

• Process gain – All individual contributions are accepted.– Find many new defects.

• Process loss – Minority contributions are ignored– Find few new defects.

A closer look - 3

A group with process looses is double negative. It rejects minority opinions and thus most defects found by just a few of the participants during:

• Individual preparation.• The group meeting.The participants can be good at finding defects –

the problem is the group process.

The NTNU-1 data

We had 20 students in the experiment. The program to inspect was130 lines long. We seeded 13 defects in the program.

1. We used groups of two, three and five students.

2. Half the groups used a tailored checklist.3. Each group inspected the code and turned in

an inspection report.

Group size and check lists - 1

We studied two effects:• The size of the inspection team. Small groups

(2 persons) versus large groups (5 persons)• The use of checklists or notIn addition we considered the combined effect –

the factor interaction.

DoE-table

Group size A

Use of checklists B

A X BNumber of

defects reported

- - + 7

- + - 9

+ - - 13

+ + + 11

Group size and check lists - 2Simple arithmetic gives us the following results:• Group size effect – small vs. large - is 4.• Check list effect – use vs. no use – is 0.• Interaction – large groups with check lists vs.

small group without – is -2.Standard deviation is 1.7. Two standard

deviations – 5% confidence – rules out everything but group size.

The NTNU-2 data

We had 40 students in the experiment. The program to inspect was130 lines long. We seeded 12 defects in the program.

1. We had 20 PhD students and 20 third year software engineering students.

2. Each student inspected the code individually and turned in an inspection report.

Defect types

The 12 seeded defects were of one of the following types:

• Wrong code – e.g. wrong parameter• Extra code - e.g. unused variable• Missing code – e.g. no exception handling

There was four defects of each type.

How often is each defect found

0,00

0,10

0,20

0,30

0,40

0,50

0,60

0,70

0,80

0,90

D3 D4 D8 D10 D2 D5 D9 D12 D1 D6 D7 D11

low experience

high experience

Who finds what – and why

First and foremost we need to clarify what we mean by high and low experience.

• High experience – PhD students. • Low experience - third and fourth year

students in software engineering. High experience, in our case, turned out to

mean less recent hands-on development experience.

Hands-on experience

The plot shows us that:• People with recent hands-on experience are

better at finding missing code• People with more engineering education are

better at finding extra – unnecessary – code.• Experience does not matter when finding

wrong code statements.