scalable statistical bug isolation authors: b. liblit, m. naik, a.x. zheng, a. aiken, m. i. jordan...
TRANSCRIPT
Scalable Statistical Scalable Statistical Bug IsolationBug Isolation
Scalable Statistical Scalable Statistical Bug IsolationBug Isolation
Authors: B. Liblit, M. Naik, A.X. Zheng, A. Authors: B. Liblit, M. Naik, A.X. Zheng, A. Aiken, M. I. JordanAiken, M. I. JordanPresented by S. Li Presented by S. Li
Outline• Background
– Definitions– Motivations
• Contributions– Algorithm– Visualization the algorithm– Experiments
• Personal Comments
Definitions• A bug predicate is denoted by P
– A P is associated with a particular program point. For instance, If (ptr == NULL) , or int a = 10;
– , if P1 , P2 are associated with the same
program point. – P is observed to be true at least once when running
R, denoted by R(P) = 1, otherwise R(P) = 0
• A bug is denoted by B (or b)• A bug profile is denoted by
21 PP
Β
Definitions (cont.)• A bug profile includes a set of failure
runs, which share b as the cause.– more than one bug occur in failure runs–
• R(P) = 1 indicates that P is a bug predictor, Likely,
•
ji BΒ
Β
BR
bugs. all of predicates hasSuch that
.predicates all ofset theis where,
S
SSS pp
Motivations • A traditional technique involved in
statistical bug prediction – Regularized logistic regression, which
select predicates that best predicate outcome of every run
• Scalability problems lie in large-scale programs
Motivations cont.• Scalability problems of
Regularized logistic regression– The set P is logically redundant– It’s difficult to achieve the actual
important predicates associated with specific bugs causing different failure.
Contributions• To Highlight Contributions
– To propose a statistical debugging algorithm• To isolate bugs that includes multiple undiagnosed
bugs
– To perform better than earlier corresponding algorithms
– To validate the algorithm by experiments– To reveal circumstances for bugs to happen
as well as frequencies of failure runs
Statistical Debugging algorithm
• To automatically isolating multiple bugs
• To select S • To rank the predicators in S from the
most to least important. • To let predicators in S and the
associated metrics be available to help fix the most serious bugs
Statistical Debugging algorithm cont.
• Steps:– Identify the most important bug B
• Not bug B but a predicate P closely correlated with its bug profile
– Fix B, and repeat• Simulating the program’s behavior without bug b
Β
Statistical Debugging algorithm cont.
• To identify the bug • To select predicates that are the most
likely to correspond to its bug profile– P1,P2,P3, …, P, ranked in the order of
importance– R(P) = 1– Bug profiles , unknown size and
membership
Β
Β
Statistical Debugging algorithm cont.
• To repeat to fix bug B– To discard any run such that R(P) = 1– To recursively apply the algorithm to
the remaining runs
– To prune P = {P1,P2,P3, …, PB} by:• Reducing the importance of predictors of B• Re-ranking predictors P, for instance,
allowing other predicators to rise to the top in subsequent iterations.
Statistical Debugging algorithm cont.
• To analyze simple codes to introduce equations in the algorithm. Consider the following C code:– f = …; Line (a)– if (f == NULL) { Line (b)
• X = 0; Line (c)• *f; } Line (d)
The bugs in this exampleis deterministic ,
because…
Statistical Debugging algorithm cont.
• Non-deterministic bugs, considering the following codes, – f = …; Line (a)– if (f == NULL) { Line (b)
• X = 0; Line (c)• if (….) f =.. // some valid pointer… • *f; } Line (d)
The bugs in this exampleis non- deterministic
With respect to (b)
Statistical Debugging algorithm cont.
• The probability that P being true implies failure. F(P): the number of failing runs in which P is observed to be true. S(P): the number of successful runs in which P is observed to be true.
) truebe toobserved |Pr()( PCrashPFailure
)()(
)()(
:Estimated
PFPS
PFPFailure
Statistical Debugging algorithm cont.
• Failure(P) = 1.0, a bug is deterministic for P, equivalently, P is never observed to be true in a successful run, S(P)=0
• Failure(P) < 1.0, non-deterministic
)()(
)()(
PFPS
PFPFailure
Statistical Debugging algorithm cont.
• Failure(P) is not enough, considering…– f = …; Line (a)– if (f == NULL) { Line (b)
• X = 0; Line (c)• *f; } Line (d)
• Failure(f == NULL) =1.0 , good• As well, Failure(x == 0) =1.0 , why?
x==0 always true, only failures reach it
Statistical Debugging algorithm cont.
• Thus, just because Failure(P) is high does not mean P is the cause of a bug, only means this predicate is checked on a path of failures.
• In the case of (x==0), the condition causing failure is made earlier, e.g.
(f == NULL)
Statistical Debugging algorithm cont.
• It is introduced to address the issue. Not only by the chance that it implies failure, but also how much difference of the P is observed to be true vs. simply reaching it where the P is checked.
• To eliminate the predicates irrelevant to the bug, like (x==0) in the above example
)observed |Pr()( PCrashPContext
Statistical Debugging algorithm cont.
)()()( PContextPFailurePIncrease
• In the above example Failure(x==0)=Context(x==0)=1.0 and so Increase(x==0)=0;
• Conclusion: a predicate P with• no useful for predication and be discarded.
0)( PIncrease
Visualization of Algorithm
• Thermometer is used for visualization of experiments • The length of the thermometer: # of runs where a
predicate is observed. • Black band on the left: Context(P); red band: Increase
(P); white band: # of successful runs; S(P)
Visualization of Algorithm cont.
• It shows F(P) after discarded the negative increase(P)
• The large white band reveals these predicates are non-deterministic.
• The very narrow red band indicate that Increase scores are small. With high increase scores
Super-bug predicate, combining Multiple bugs
Visualization of Algorithm cont.
• The following suggestions of metric of predicates are made from the above observation
)log())(log(
1)(
12
)(Improtance
);()(Importance
);()(Importance
NumFPFpIncrese
P
PIncreaseP
PFP
Experiments • To validate the statistic debugging
algorithm in five case studies. • To determine how many runs needed ,
let importanceN(P) be the importance of P using N runs. So,
Importance32,000(P) – ImportanceN(P)<0.2
Personal Comments • Likes
– Well structure, problems addressed, then proposed solutions addressed, step by step
– Using a real and simple example to explain problems and difficulties that lies in research
– Giving statistical interpretation by visualization, using their observation to explain the abstract mathematic equations
Personal Comments• Dislikes
– They do not mention whether their research could be extended for isolation potential bugs, e.g. bugs with less importance, which probably cause failure in future
– The dark (red) band and grey (pink) dark band in pictures are not very clear if this paper is only white/ black.