testing games: randomizing regression tests using game theory nupul kukreja, william g.j. halfond,...
TRANSCRIPT
1
Testing Games:Randomizing Regression Tests Using
Game Theory
Nupul Kukreja, William G.J. Halfond, Milind Tambe & Manish Jain
Annual Research ReviewApril 30, 2014
2
Outline• Motivation• Problem(s) with traditional test scheduling• Game Theory and Randomization• Modeling software testing as a 2-player game• Evaluation• Conclusion & Future Work
3
Motivation
Software Size
Regression Suite
DEVS
The deadline is close too!
Dude! Suite XX is not
gonna run!
Let’s CODE NOW FIX LATER
4
Motivating Problem(s)• Existing test case scheduling activities are
deterministic• Developers know which test cases will be
executed when• Developers can check in insufficiently tested
code closer to delivery deadline• High-turn around time for fixing bugs in low
priority features• Random test-scheduling helpful but treats
each test case as equally important
5
Software Testing as a 2-Player Game
• This tension between software testers and developers can be modeled as a two-player game
• We solve the game to answer the following question:– Given an adaptive adversary (developers) and
resource constraints (testers) what is the optimum test-scheduling strategy that maximizes the tester’s expected payoff?
6
Game Theory• Study of strategic decision making among
multiple players – corporations, software agents, testers and developers, regular humans etc.,
7
Two-player “Security” Game
Adversary
Terminal 1 Terminal 2
Defender
Terminal 1-3 1
5 -1
Terminal 25 -1
-5 2
Security game assumptions:1. What is good for one player (+ve payoff) is bad for the other (-ve payoff)2. Adversary can conduct perfect surveillance and act appropriately i.e.,
these are simultaneous move games or Stackelberg games
60%
40%
8
Testing Game
Developer
Requirement 1
Check in ITC* Check in PC*
Tester Requirement 1
Test-3 1
5 -1
Don’t Test5 -1
-5 2
*ITC: Insufficiently tested code*PC: Perfect code i.e., 100% tested
9
Testing Game – Payoffs• Payoffs are either positive or negative• Proportional to the value of the requirement
for both, the tester & developer• Payoffs can be derived in many ways:– Directly from requirement priorities– Expert judgment and/or planning poker– Delphi methods– Directly from test-case priorities
10
Defining Test Requirements• Could be black-box or white-box based• If black-box, TR may correspond to:– Module/component– Method– OR…the requirement as a whole
• “We” group test cases by requirements– Each requirement is ‘covered’ by one or more test
cases (or suites)
11
Not All Developers Are The Same• Commonly encountered personality traits– Lazy/sloppy– New Grad– Moderate/Average– Seasoned Developer
• Each persona has a probability of “screwing up” i.e., checking in insufficiently tested code
• We can compute these probabilities by looking at the team composition
The Testing Game
12
P(seasoned) = 2/10
P(sloppy) = 3/10
P(avg) = 5/10
13
Solving the Testing Game
Probability of schedulingtest case ‘i'
14
0.1398 0.1344 0.2307 0.4538 0.0414
Req 1 Req 2 Req 3 Req 4 Req 5
Tester 2 -10 7 -4 6 -1 9 -9 9 -9
Developer -7 4 -1 3 -6 5 -3 7 -10 3
Example Testing Game
Create a test case scheduling of ‘m’ test cases by sampling from the above distribution
15
Evaluation• Large simulation:– 1000 test requirements = 1 Game
• 1000 Games randomly generated– Each game played/solved 1000 times over
– Payoffs range from [-10,10]– Constraint: Can only schedule/execute 500 test cases
• Compared with:– Deterministic test scheduling – Uniform Random test scheduling– Weighted Random test scheduling
• Tester-only weights• Tester+developer based weights
16
Results
17
Limitations and Threats to Validity• Developers not adversarial• Developers may choose to be sloppy at times
with a particular probability• Lack of perfect historical observation for
developers• Expected payoffs is mostly a mathematical
notation
18
Conclusion & Future Work• New approach for test case scheduling using Game
Theory– Accounts for tester and developer’s payoffs
• Randomizing test cases acts as deterrent for developers, for checking in insufficiently tested code
• The test case distribution is optimum under resource constraints and maximizes payoff for worst case developer behavior – robust!
• Simulation shows positive results and is a first step to analyzing the tester/developer relationship
DEVS
Adversary
Terminal 1 Terminal 2
Defender
Terminal 1-3 1
5 -1
Terminal 25 -1
-5 2
Thank you!Questions?
0.1398 0.1344 0.2307 0.4538 0.0414
Req 1 Req 2 Req 3 Req 4 Req 5
Tester 2 -10 7 -4 6 -1 9 -9 9 -9
Developer -7 4 -1 3 -6 5 -3 7 -10 3