scalable statistical bug isolation authors: b. liblit, m. naik, a.x. zheng, a. aiken, m. i. jordan...

27
Scalable Scalable Statistical Bug Statistical Bug Isolation Isolation Authors: B. Liblit, M. Naik, A.X. Authors: B. Liblit, M. Naik, A.X. Zheng, A. Aiken, M. I. Jordan Zheng, A. Aiken, M. I. Jordan Presented by S. Li Presented by S. Li

Upload: cora-cain

Post on 02-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Scalable Statistical Scalable Statistical Bug IsolationBug Isolation

Scalable Statistical Scalable Statistical Bug IsolationBug Isolation

Authors: B. Liblit, M. Naik, A.X. Zheng, A. Authors: B. Liblit, M. Naik, A.X. Zheng, A. Aiken, M. I. JordanAiken, M. I. JordanPresented by S. Li Presented by S. Li

Outline• Background

– Definitions– Motivations

• Contributions– Algorithm– Visualization the algorithm– Experiments

• Personal Comments

Definitions• A bug predicate is denoted by P

– A P is associated with a particular program point. For instance, If (ptr == NULL) , or int a = 10;

– , if P1 , P2 are associated with the same

program point. – P is observed to be true at least once when running

R, denoted by R(P) = 1, otherwise R(P) = 0

• A bug is denoted by B (or b)• A bug profile is denoted by

21 PP

Β

Definitions (cont.)• A bug profile includes a set of failure

runs, which share b as the cause.– more than one bug occur in failure runs–

• R(P) = 1 indicates that P is a bug predictor, Likely,

ji BΒ

Β

BR

bugs. all of predicates hasSuch that

.predicates all ofset theis where,

S

SSS pp

Motivations • A traditional technique involved in

statistical bug prediction – Regularized logistic regression, which

select predicates that best predicate outcome of every run

• Scalability problems lie in large-scale programs

Motivations cont.• Scalability problems of

Regularized logistic regression– The set P is logically redundant– It’s difficult to achieve the actual

important predicates associated with specific bugs causing different failure.

Contributions• To Highlight Contributions

– To propose a statistical debugging algorithm• To isolate bugs that includes multiple undiagnosed

bugs

– To perform better than earlier corresponding algorithms

– To validate the algorithm by experiments– To reveal circumstances for bugs to happen

as well as frequencies of failure runs

Statistical Debugging algorithm

• To automatically isolating multiple bugs

• To select S • To rank the predicators in S from the

most to least important. • To let predicators in S and the

associated metrics be available to help fix the most serious bugs

Statistical Debugging algorithm cont.

• Steps:– Identify the most important bug B

• Not bug B but a predicate P closely correlated with its bug profile

– Fix B, and repeat• Simulating the program’s behavior without bug b

Β

Statistical Debugging algorithm cont.

• To identify the bug • To select predicates that are the most

likely to correspond to its bug profile– P1,P2,P3, …, P, ranked in the order of

importance– R(P) = 1– Bug profiles , unknown size and

membership

Β

Β

Statistical Debugging algorithm cont.

• To repeat to fix bug B– To discard any run such that R(P) = 1– To recursively apply the algorithm to

the remaining runs

– To prune P = {P1,P2,P3, …, PB} by:• Reducing the importance of predictors of B• Re-ranking predictors P, for instance,

allowing other predicators to rise to the top in subsequent iterations.

Statistical Debugging algorithm cont.

• To analyze simple codes to introduce equations in the algorithm. Consider the following C code:– f = …; Line (a)– if (f == NULL) { Line (b)

• X = 0; Line (c)• *f; } Line (d)

The bugs in this exampleis deterministic ,

because…

Statistical Debugging algorithm cont.

• Non-deterministic bugs, considering the following codes, – f = …; Line (a)– if (f == NULL) { Line (b)

• X = 0; Line (c)• if (….) f =.. // some valid pointer… • *f; } Line (d)

The bugs in this exampleis non- deterministic

With respect to (b)

Statistical Debugging algorithm cont.

• The probability that P being true implies failure. F(P): the number of failing runs in which P is observed to be true. S(P): the number of successful runs in which P is observed to be true.

) truebe toobserved |Pr()( PCrashPFailure

)()(

)()(

:Estimated

PFPS

PFPFailure

Statistical Debugging algorithm cont.

• Failure(P) = 1.0, a bug is deterministic for P, equivalently, P is never observed to be true in a successful run, S(P)=0

• Failure(P) < 1.0, non-deterministic

)()(

)()(

PFPS

PFPFailure

Statistical Debugging algorithm cont.

• Failure(P) is not enough, considering…– f = …; Line (a)– if (f == NULL) { Line (b)

• X = 0; Line (c)• *f; } Line (d)

• Failure(f == NULL) =1.0 , good• As well, Failure(x == 0) =1.0 , why?

x==0 always true, only failures reach it

Statistical Debugging algorithm cont.

• Thus, just because Failure(P) is high does not mean P is the cause of a bug, only means this predicate is checked on a path of failures.

• In the case of (x==0), the condition causing failure is made earlier, e.g.

(f == NULL)

Statistical Debugging algorithm cont.

• It is introduced to address the issue. Not only by the chance that it implies failure, but also how much difference of the P is observed to be true vs. simply reaching it where the P is checked.

• To eliminate the predicates irrelevant to the bug, like (x==0) in the above example

)observed |Pr()( PCrashPContext

Statistical Debugging algorithm cont.

)()()( PContextPFailurePIncrease

• In the above example Failure(x==0)=Context(x==0)=1.0 and so Increase(x==0)=0;

• Conclusion: a predicate P with• no useful for predication and be discarded.

0)( PIncrease

Visualization of Algorithm

• Thermometer is used for visualization of experiments • The length of the thermometer: # of runs where a

predicate is observed. • Black band on the left: Context(P); red band: Increase

(P); white band: # of successful runs; S(P)

Visualization of Algorithm cont.

• It shows F(P) after discarded the negative increase(P)

• The large white band reveals these predicates are non-deterministic.

• The very narrow red band indicate that Increase scores are small. With high increase scores

Super-bug predicate, combining Multiple bugs

Visualization of Algorithm cont.

Visualization of Algorithm cont.

• The following suggestions of metric of predicates are made from the above observation

)log())(log(

1)(

12

)(Improtance

);()(Importance

);()(Importance

NumFPFpIncrese

P

PIncreaseP

PFP

Experiments • To validate the statistic debugging

algorithm in five case studies. • To determine how many runs needed ,

let importanceN(P) be the importance of P using N runs. So,

Importance32,000(P) – ImportanceN(P)<0.2

Experiments

Personal Comments • Likes

– Well structure, problems addressed, then proposed solutions addressed, step by step

– Using a real and simple example to explain problems and difficulties that lies in research

– Giving statistical interpretation by visualization, using their observation to explain the abstract mathematic equations

Personal Comments• Dislikes

– They do not mention whether their research could be extended for isolation potential bugs, e.g. bugs with less importance, which probably cause failure in future

– The dark (red) band and grey (pink) dark band in pictures are not very clear if this paper is only white/ black.