benchmarking effectiveness for object-oriented unit testing anthony j h simons and christopher d...
TRANSCRIPT
![Page 1: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/1.jpg)
BenchmarkingEffectiveness
for Object-Oriented Unit Testing
Anthony J H Simons and Christopher D Thomson
![Page 2: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/2.jpg)
Overview
Measuring testing? The Behavioural Response Measuring six test cases Evaluation of JUnit tests Evaluation of JWalk tests
http://www.dcs.shef.ac.uk/~ajhs/jwalk/
![Page 3: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/3.jpg)
Analogy: Metrics and Testing
Things easy to measure (but why?)– metrics: MIT O-O metrics (Chidamber & Kemmerer)– testing: decision-, path-, whatever-coverage– testing: count exceptions, reduce test-set size
Properties you really want (but how?)– metrics: Goal, Question, Metric (Basili et al.)– testing: e.g. mutant killing index– testing: effectiveness and efficiency?
![Page 4: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/4.jpg)
Measuring Testing?
Most approaches measure testing effort,
rather than test effectiveness!
![Page 5: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/5.jpg)
Degrees of Correctness
Suppose an ideal test set– BR : behavioural response (set)– T : tests to be evaluated (bag – duplicates?)
– TE = BR T : effective tests (set)
– TR = T – TE : redundant tests (bag)
Define test metrics– Ef(T) = (|TE | – |TR |) / |BR| : effectiveness
– Ad(T) = |TE | / |BR| : adequacy
![Page 6: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/6.jpg)
Ideal Test Set?
The ideal test set must verify each distinct
response of an object!
![Page 7: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/7.jpg)
What is a Response?
Input response– Account.withdraw(int amount) : 3 partitions
• amount < 0 fail precondition, exception
• amount > balance refuse, no change
• amount <= balance succeed, debit
State response– Stack.pop() : 2 states
• isEmpty() fail precondition, exception
• ! isEmpty() succeed
![Page 8: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/8.jpg)
Behavioural Response – 1
Input response– c.f. exemplars of equivalence partitions– max responses per method, over all states
State response– c.f. state cover, to reach all states– max state-contingent responses, over all methods
Behavioural Response– product of input and state response– checks all argument partitions in all states– c.f. transition cover augmented by exemplars
![Page 9: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/9.jpg)
Behavioural Response – 2
Parametric form: BR(x, y)– stronger ideal sets, for higher x, y
x = length of sequences from each statey = number of exemplars for each partition
Redundant states– higher x rules out faults hiding in duplicated states
Boundary values– higher y verifies equivalence partition boundaries
Useful measure– precise quantification of what has been tested– repeatable guarantees of quality after testing
![Page 10: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/10.jpg)
Compare Testing Methods
JWalk – “Lazy systematic unit testing method”
JUnit – “Expert manual unit
testing method”
![Page 11: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/11.jpg)
JUnit – Beck, Gamma
“Automates testing”– manual test authoring (as good as human expertise)– may focus on positive, miss negative test cases– saved tests automatically re-executed on demand– regression style may mask hard interleaved cases
Test harness– bias: test method “testX” for each method “X” – each “testX” contains n assertions = n test cases– same assertions appear redundantly in “testY”, “testZ”
![Page 12: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/12.jpg)
JWalk – Simons
Lazy specification– static analysis of compiled code– dynamic analysis of state model– adapts to change, revises the state model
Systematic testing– bounded exhaustive state-based exploration– may not generate exemplars for all input partitions– semi-automatic oracle construction (confirm key values)– learns test equivalence classes (predictive testing)– adapts existing oracles, superclass oracles
![Page 13: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/13.jpg)
Six Test Cases
Stack1 – simple linked stack Stack2 – bounded array stack
– change of implementation
Book1 – simple loanable book Book2 – also with reservations
– extension by inheritance
Account1 – with deposit/withdraw Account2 – with preconditions
– refinement of specification
![Page 14: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/14.jpg)
Instructions to Testers
Test each response for each class, similar to the transition cover, but with all equivalence
partitions for method inputs
![Page 15: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/15.jpg)
Behavioural Response
Test Class API Input R State R BR(1,1)
Stack1 6 6 2 12
Stack2 7 7 3 21
Book1 5 5 2 10
Book2 9 10 4 40
Account1 5 6 2 12
Account2 5 9 2 18
ideal test target
![Page 16: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/16.jpg)
JUnit – Expert Testing
Test Class T TE TR Ad(T) Ef(T) time
Stack1 20 12 8 1.00 0.33 11.31
Stack2 23 16 7 0.76 0.43 +14.00
Book1 31 9 22 0.90 -1.30 11.00
Book2 104 21 83 0.53 -1.55 +20.00
Account1 24 12 12 1.00 0.00 14.37
Account2 22 17 5 0.94 0.67 08.44
massive generationstill not effective
![Page 17: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/17.jpg)
JWalk – Test Generation
Test Class T TE TR Ad(T) Ef(T) time
Stack1 12 12 0 1.00 1.00 0.42
Stack2 21 21 0 1.00 1.00 0.50
Book1 10 10 0 1.00 1.00 0.30
Book2 36 36 0 0.90 0.90 0.46
Account1 12 12 0 1.00 1.00 1.17
Account2 17 17 0 0.94 0.94 16.10
no wasted testsmissed 5 inputs
![Page 18: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/18.jpg)
Comparisons
JUnit: expert manual testing– massive over-generation of tests (w.r.t. goal)– sometimes adequate, but not effective– stronger (t2, t3); duplicated; and missed tests– hopelessly inefficient – also debugging test suites!
JWalk: lazy systematic testing– near-ideal coverage, adequate and effective– a few input partitions missed (simple generation strategy)– very efficient use of the tester’s time – sec. not min.– or: two orders (x 1000) more tests, for same effort
![Page 19: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/19.jpg)
Conclusion
Behavioural Response– seems like a useful benchmark (scalable, flexible)– use with formal, semi-formal, informal design methods– measures effectiveness, rather than effort
Moral for testing– don’t hype up automatic test (re-)execution– need systematic test generation tools– automate the parts that humans get wrong!
![Page 20: Benchmarking Effectiveness for Object-Oriented Unit Testing Anthony J H Simons and Christopher D Thomson](https://reader036.vdocuments.us/reader036/viewer/2022082821/5697c0191a28abf838cce8c1/html5/thumbnails/20.jpg)
Any Questions?
http://www.dcs.shef.ac.uk/~ajhs/jwalk/
Put me to the test!