Метрики покрытия. Прагматичный подход

<Insert Picture Here>

Code coverage.The pragmatic approach.Александр ИльинJava Quality architectOracle

What it is about?

Should the testing be stopped at 100% coverage?

Should 100% be the goal?

How (else) to use code coverage information?

2

What it is not about?

Tools


Preface

What is the code coverage data for

Measure to which extent source code is covered during testing.

4

consequently …

Code coverage is

A measure of how much source code is covered during testing.

Testing is

A set of activities aimed to prove that the system under test behaves as expected.

finally …

CC – how to get

• Create a template

5

• “Instrument” the source/compiled code/bytecode

• Run testing, collect data

• Generate report

Template is a collection of all the code there is to cover

Insert instructions for dropping data into a file/network, etc.

May need to change environment

HTML, DB, etc

• Block / primitive block• Line• Condition/branch/predicate• Entry/exit• Method• Path/sequence

6

CC – kinds of

CC – how to use

•

7

• Perform analysis

• Develop more tests

• GOTO 1

Performed repeatedly, so resource-efficiency is really important

Find what tests you need to develop.

• 1: Measure (prev. slide)

for testbase improvement

Find what code you need to cover.Find what code you need to cover.

• Find dead code

Measure (prev. slide)


Mis-usages

• Must get to 100%

9

• 100% means no more testing

• CC does not mean a thing

• There is that tool which would generate tests for us and we're done

CC – how not to usemis-usages

May be not.

No it does not.

It does mean a fair amount if it is used properly.

Nope.


Mis-usages

Test generation

Test generation

“We present a new symbolic execution tool, ####, capable of automatically generating tests that achieve high coverage on a diverse set of complex and environmentally-intensive programs.”

#### tool documentation

Test generation cont.

if ( b != 3 ) {

double a = 1 / ( b – 2);

} else {

…

}

Test generation cont.

if ( b != 3 ) {

double a = 1 / ( b – 3);

} else {

…

}

Reminder: testing is ...A set of activities aimed to prove that the system under test behaves as expected.

Test generation - conclusion

Generated tests could not test that the code work as expected because they only know how the code works and not how it is expected to. Because the only thing they possess is the code which may already be not working as expected. :)

Hence …

Generated tests code coverage should not be mixed with regular functional tests code coverage.

14

Who watches the watchmen?

• Test logic gotta be right• No way to verify the logic

• No metrics• No approaches• No techniques• Code review – the only way

• Sole responsibility of test developer


Mis-usages

What does 100% coverage mean?

100% block/line coverage

1

false

100% branch coverage

1

true

-1

false

100% domain coverage

0

0

.1

� .1

-.1

Exception

100% sequence coverage(-1,-1)

1

(1,1)

(0,0)

1 NaN

b

100% sequence coverage(-1,-1)

1

(1,1)

(0,0)

1 NaN

b

(-1,1)

(1,-1)

-1 -1

But … isPositive(float) has a defect!

100% sequence coverage

• Has conceptual problems• Code semantics

• Loops

• One of the two• Assume libraries has no errors• Done in depth – with the libraries

• Very expensive• A lot of sequences: 2# branches, generally speaking• Very hard to analyze data

100% coverage - conclusion

100% block/line/branch/path coverage, even if reachable, does not prove much.

Hence …

No need to try to get there unless ...

23


Mis-usages

Target value

CC target value - cost

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

Test Dev. Effort by Code Block Coverage.

Code Block Coverage (%)

Re

lativ

e T

est D

ev.

Eff

ort (

1 a

t 50%

cod

e b

lock

co

vera

ge)

Model not reliable below 50% coverage, except maybe very big projects.

Industry data indicates that effort increases exponentially with coverage.

f x =k er x

k=e−50r⇒ f 50=1

dfdx

=r f x

Intuition: the effort needed to get more coverage is proportional to the total effort needed to get current coverage.

We scale to make effort relative to the effort of getting 50% coverage.

CC target value - effectiveness

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1000.00

20.00

40.00

60.00

80.00

100.00

120.00

Defect Coverage by Code Block Coverage

Code Block Coverage(%)

De

fect

Cov

erag

e(%

)

Model not reliable below 50% coverage except maybe very big projects.

H x =h f x

f x =k er x

h y =B1−e−sBy

dHdx

=s 1−H x B

dfdx

x

Defect coverage by code block coverage...

defi ned in terms of effort per code coverage and defect coverage by effort.

Intuition: discovery rate is proportional to the percentage of bugs remaining and the effort needed to get current coverage.

CC target value - ROI

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

-200.00

0.00

200.00

400.00

600.00

800.00

1000.00

1200.00

Cost-Benefit Analysis

Code Block Coverage (%)

Ben

efit

($/s

ize

), C

ost

($

/siz

e),

RO

I (%

)

Benefi t(c) = DC(c) DD COD, whereDC(c): Defect CoverageDD: Defect Density. Example: 50bug/klocCOD: Cost Of Defect. Example: $20k/bug

ROI = Benefi t(c)/ Cost(c) - 1

Cost(c) = F + V * RE(c), whereRE(c): Relative Effort, RE(50%) = 1F: Fixed cost of test. Example: $50k/klocV: Variable cost of test. Example: $5k/kloc

100% coverage - conclusion

100% block/line/branch/path coverage, even if reachable, does not prove much.

Hence …

No need to try to get there unless …

100% is the target value.

Which could happen if cost of a bug is really big and/or the product is really small.

28

Target value - conclusion

True target value for block/line/branch/path comes from ROI, which is really hard to calculate and justify.

29


Usages

• Test base improvement.

31

• Dead code.

• Metric

• Control over code development

CC – how to use

Right. How to select which tests to develop first

Barely an artifact

Better have a good metric.

• Deep analysis


CC as a metric

What makes a good metric

Simple to explain

Simple to work towards

Has a clear goal

So that you could explain your boss why is that important to spend resources on

So that you know what to do to improve

So you could tell how far are you.

Is CC a good metric?

Simple to explain

Simple to work towards

Has a clear goal

Is a metric of quality of testing.

(Relatively) easy to map uncovered code to missed tests.

Nope. ROI – too complicated.

Need to filter the CC data

so only that is left which must be covered

+

+

-

Public API*

Is a set of program elements suggested for usage by public documentation.

For example: all functions and variables which are described in documentation.

For a Java library: all public and protected methods and fields mentioned in the library javadoc.

For Java SDK: … of all public classes in java and javax packages.

(*) Only applicable for a library or a SDK

Public API

True Public API (c)

Is a set of program elements which could be accessed directly by a library user

Public API

+

all extensions of public API in non-public classes

True public API example

ArrayList.java

My code

True Public API how to get

• Get public API with interfaces• Filter template so that it only contains implementations

and extensions of the public API (*)• Filter the data by template

(*) This assumes that you either• Use a tool which allows such kind of filtering

or• Have the data in a parse-able format and develop the

filtering on your own

UI coverage

In a way, equivalent to public API but for a UI product

• %% of UI elements shown – display coverage• %% user actions performed – action coverage

Only “action coverage” could be obtained from CC data (*).

(*) For UI toolkits which the presenter is familiar with.

Action coverage – how to get

• Collect CC• Extract all implementations of javax.swing.Action.actionPerformed(ActionEvent)

orjavafx.event.EventHandler.handle(Event)

• Inspect all the implementationsorg.myorg.NodeAction.actionPerformed(ActionEvent)

• Add to the filter: org.myorg.NodeAction.nodeActionPerformed(Node myNode)

• Extract, repeat

“Controller” code coverage

ModelContains the domain logic

ViewImplements user interaction

ControllerMaps the two. Only contains code which is called as a result of view actions and model feedbacks.

Controller has very little boilerplate code. A good candidate for 100% block coverage.

“Important” code

• Development/SQE marks class/method as important• We use an annotation @CriticalForCoverage

• List of methods is obtained which are marked as important• We do that by an annotation processor right while main

compilation

• CC data is filtered by the method list• Goal is 100%

Examples of non-generic metrics

• BPEL elements• JavaFX properties

• A property in JavaFX is something you could set, get and bind

• Insert your own.

CC as a metric - conclusion

There are multiple ways to filter CC data to a set of code which needed to be covered in full.

There are generic metrics and there is a possibility to introduce product specific metric.

Such metrics are easy to use, although not always so straightforward to obtain.

45


Test prioritization

Test prioritization

100500 uncovered lines of code!

Metric• Develop tests to close the metric• Pick another metric

“Metrics for managers. Me no manager! Me write code!”

Consider mapping CC data to few other source code characteristics.

“OMG! Where do I start?”

Age of the code

New code is better be tested before getting to customer.

Improves bug escape rate, BTW

Old code is more likely to be tested by users

or

Not used by users.

What's a bug escape metric?

Ratio of defects sneaked out unnoticed

# defects not found before release

# defects in the productIn theory:

# defects found after + # defects found beforePractical:

# defects found after release

Number of changes

More times a piece of code was changed, more atomic improvements/bugfixes were implemented in it.

Hence …

Higher risk of introducing a regression.

Number of lines changed

More lines changed – more testing it needs.

Better all – number of uncovered lines which were changed in the last release.

Bug density

Assuming all the pieces were tested equally well …

Many bugs means there are, probably, even more• Hidden behind the known ones• Fixing existing ones may introduce yet more as regressions

Code complexity

Assuming the same engineering talent and the same technology …

More complex the code is – more bugs likely to be there.

Any complexity metric would work: from class size to cyclomatic complexity

Putting it together

A formula

(1 – cc) * (a1*x

1 + a

2*x

2 + a

3*x

3 + ...)

Where

cc – code coverage (0 - 1)

xi – a risk of bug discovery in a piece of code

ai – a coefficient

Putting it together

(1 – cc) * (a1*x

1 + a

2*x

2 + a

3*x

3 + ...)

The ones with higher value are first to cover

• Fix the coefficients• Develop tests• Collect statistics on bug escape• Fix the coefficient• Continue

Test prioritization - conclusion

CC alone may not give enough information.

Need to accompany it with other characteristics of test code to make a decision.

Could use a few of other characteristics simultaneously.

56


Test prioritization

Execution

Decrease test execution time

Exclude tests which do not add coverage (*).

But, be careful! Remember that CC is not all and even 100% coverage does not mean a lot.

While excluding tests get some orthogonal measurement as well, such as specification coverage.

(*) Requires “test scales”

Deep analysis

Study the coverage report, see what test code exercises which code. (*).

Recommended for developers.

(*) Also requires “test scales”

Controlled code changes

Do not allow commits unless all the new/changed code is covered.

Requires simultaneous commits of tests and the changes.

Code coverage - conclusion

100% CC does not guarantee that the code is working right

100% CC may not be needed

It is possible to build good metrics with CC

CC helps with prioritization of test development

Other source code characteristics could be used with CC

61

Coverage data is not free

• Do just as much as you can consume *• Requires infrastructure work• Requires some development• Requires some analysis

(*) The rule of thumb

Coverage data is not free

• Do just as much as you can consume• Requires infrastructure work• Requires some development• Requires some analysis

• Do just a little bit more than you can consume *• Otherwise how do you know how much you can consume?

(*) The rule of thumb


Code coverage.The pragmatic approach.Александр ИльинJava Quality architectOracle

Метрики покрытия. Прагматичный подход

Education