user-guided program reasoning using bayesian …kheo/slides/bingo-kaist.pdfuser-guided program...

Post on 19-Jun-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

User-Guided Program Reasoning using Bayesian Inference

Kihong Heo(joint work with Mukund Raghothaman, Sulekha Kulkarni, Mayur Naik)

University of Pennsylvania

Jul 6 2018 @ KAIST

�1

Conventional Static Analysis

Static AnalyzerSoundness

Precision Scalability

“needle in a haystack”

Designer User

!2

Why?

Static Analyzer

Designer User

!3

Soundness Precision Scalability

“needle in a haystack”

He does not know her: “What is the optimal strategy regarding

severity, context, idiom, etc?”

She does not know him: “Why does this alarm occur?”

“How to avoid the similar false alarms?”

“… can be difficult to do without introducing large numbers of false positives, or scaling performance exponentially poorly. In this case, balancing these and other factors in the analysis design caused us to miss the defect.”

— Coverity, On Detecting Heartbleed with Static Analysis, 2014

!4

Next-generation Static Analysis

Static Analyzer

!5

Next-generation Static Analysis

!6

Static Analyzer

Next-generation Static Analysis

!6

Static Analyzer

AI-based Analysis Design

• Human provides high-level idea

• AI provides detailed design choices

• DB accumulates performance data

• e.g.) precision [SAS’16,OOPSLA’17], soundness [ICSE’17], resource usage [in progress], rule learning [in progress]

Next-generation Static Analysis

!7

Static Analyzer

Next-generation Static Analysis

!7

Static Analyzer

AI-based Alarm Report

• AI prioritizes/classifies alarms

• Human inspects high confidence alarms

• DB accumulates human-labeled data

• e.g.) interactive alarm ranking [PLDI’18]

BINGO: An Interactive Alarm Ranking System

*User-Guided Program Reasoning using Bayesian Inference, PLDI’18

!8

Interactive Alarm Ranker

Bug

False Alarm

Rank 1

Rank n !9

Interactive Alarm Ranker

Bug

False Alarm

Rank 1

Rank n !10

Interactive Alarm Ranker

Bug

False Alarm

Rank 1

Rank n !11

Interactive Alarm Ranker

Bug

False Alarm

Rank 1

Rank n !12

Interactive Alarm RankerRank 1

Rank n !13

Key IdeaHuman in the loop + Bayesian inference

path(1,7)edge(7,2) edge(7,5)

path(1,2) path(1,5) edge(5,8)

path(1,8)edge(8,3) edge(8,6)

path(1,3) path(1,6)

edge(1,7)path(1,1)

f(){ v1 = new ...v2 = id1(v1)v3 = id2(v2)assert(v3!=v1) q1}id1(v){ return v }

g(){ v4 = new ...v5 = id1(v4)v6 = id2(v5)assert(v6!=v1) q2}id2(v){ return v }

Static Analysis Result Bayesian Network User

A B C ㄱCT T 0.95 0.05T F 0.94 0.06F T 0.29 0.71F F 0.01 0.99

B A ㄱA

T 0.52 0.48F 0.01 0.99

B ㄱB

0.04 0.96

A

C

B

!14

Case Study: Datarace

!15

Case Study: Information Flow

!16

Ex: Datarace Analysispublic class RequestHandler { private FtpRequest request;

public FtpRequest getRequest() { return request;

}

public void close() { synchronized (this) { if (isClosed) return; isClosed = true;

} controlSocket.close();controlSocket = null; request.clear();request = null;

} }

//L0

//L1 //L2 //L3

//L4 //L5 //L6 //L7

*Apache FTP Server

Parallel(p1, p3) :- Parallel(p1, p2), Next(p2, p3), Unguarded(p1, p3).Parallel(p1, p2) :- Parallel(p2, p1). Race(p1, p2) :- Parallel(p1, p2), Alias(p1, p2).

!17

Ex: Datarace Analysispublic class RequestHandler { private FtpRequest request;

public FtpRequest getRequest() { return request;

}

public void close() { synchronized (this) { if (isClosed) return; isClosed = true;

} controlSocket.close();controlSocket = null; request.clear();request = null;

} }

//L0

//L1 //L2 //L3

//L4 //L5 //L6 //L7

*Apache FTP Server

Datarace

Parallel(p1, p3) :- Parallel(p1, p2), Next(p2, p3), Unguarded(p1, p3).Parallel(p1, p2) :- Parallel(p2, p1). Race(p1, p2) :- Parallel(p1, p2), Alias(p1, p2).

!18

Ex: Datarace Analysispublic class RequestHandler { private FtpRequest request;

public FtpRequest getRequest() { return request;

}

public void close() { synchronized (this) { if (isClosed) return; isClosed = true;

} controlSocket.close();controlSocket = null; request.clear();request = null;

} }

//L0

//L1 //L2 //L3

//L4 //L5 //L6 //L7

*Apache FTP Server

False alarm

False alarm

Parallel(p1, p3) :- Parallel(p1, p2), Next(p2, p3), Unguarded(p1, p3).Parallel(p1, p2) :- Parallel(p2, p1). Race(p1, p2) :- Parallel(p1, p2), Alias(p1, p2).

!19

Derivation Graph

Parallel(p1, p3) :- Parallel(p1, p2), Next(p2, p3), Unguarded(p1, p3).Parallel(p1, p2) :- Parallel(p2, p1). Race(p1, p2) :- Parallel(p1, p2), Alias(p1, p2).

R(L6, L7)

A(L6, L7) P(L6, L7)

N(L6, L7) P(L6, L6)U(L6, L7)

N(L5, L6) P(L6, L5)U(L6, L6)

N(L4, L5) P(L6, L4)U(L6, L5)

P(L4, L6)

U(L4, L6)N(L5, L6)

R(L4, L5)

P(L4, L5)A(L4, L5)

U(L4, L5)N(L4, L5)P(L4, L4)controlSocket.close();controlSocket = null; request.clear();request = null;

//L4 //L5 //L6 //L7

Program

Datalog Rule

Derivation Graph

!20

Bayesian Network

P(L4,L4) N(L4,L5) U(L4,L5) Pr(P(L4,L5) | H)

TRUE TRUE TRUE 0.95

TRUE TRUE FALSE 0

FALSE FALSE FALSE 0

P(L4, L5)

Logical Rule Probabilistic Rule

Parallel(p1, p3) :- Parallel(p1, p2), Next(p2, p3), Unguarded(p1, p3).Parallel(p1, p2) :- Parallel(p2, p1). Race(p1, p2) :- Parallel(p1, p2), Alias(p1, p2).

U(L4, L5)N(L4, L5)P(L4, L4)

!21

*Prior probability is computed by an offline learning

Marginal Inference

R(L4, L5)

P(L4, L5)A(L4, L5)

Pr(R(L4,L5)) = Pr(R(L4,L5), A(L4,L5), P(L4,L5)) + Pr(R(L4,L5), ¬A(L4,L5), P(L4,L5)) + Pr(R(L4,L5), A(L4,L5), ¬P(L4,L5)) + Pr(R(L4,L5), ¬A(L4,L5), ¬P(L4,L5))

!22

U(L4, L5)N(L4, L5)P(L4, L4)

Marginal Inference

R(L4, L5)

P(L4, L5)A(L4, L5)

Pr(R(L4,L5)) = Pr(R(L4,L5), A(L4,L5), P(L4,L5)) + Pr(R(L4,L5), ¬A(L4,L5), P(L4,L5)) + Pr(R(L4,L5), A(L4,L5), ¬P(L4,L5)) + Pr(R(L4,L5), ¬A(L4,L5), ¬P(L4,L5))

If any of the antecedents fail, then the race cannot happen.

!23

U(L4, L5)N(L4, L5)P(L4, L4)

Marginal Inference

R(L4, L5)

P(L4, L5)A(L4, L5)

Pr(R(L4,L5)) = Pr(R(L4,L5), A(L4,L5), P(L4,L5))

!24

U(L4, L5)N(L4, L5)P(L4, L4)

Marginal Inference

R(L4, L5)

P(L4, L5)A(L4, L5)

Pr(R(L4,L5)) = Pr(R(L4,L5), A(L4,L5), P(L4,L5)) = Pr(R(L4,L5) | A(L4,L5), P(L4,L5)) * Pr(A(L4,L5)) * Pr(P(L4,L5))

By Bayes’s Rule: Pr(A,B) = Pr(A|B) * Pr(B)

!25

U(L4, L5)N(L4, L5)P(L4, L4)

Marginal Inference

R(L4, L5)

P(L4, L5)A(L4, L5)

Pr(R(L4,L5)) = Pr(R(L4,L5), A(L4,L5), P(L4,L5)) = Pr(R(L4,L5) | A(L4,L5), P(L4,L5)) * Pr(A(L4,L5)) * Pr(P(L4,L5)) = 0.95 * 1.0 * Pr(P(L4,L5)) = 0.95 * Pr(P(L4,L5), Pr(P(L4,L4)), Pr(N(L4,L5), Pr(U(L4,L5))

Assume that the probabilities of firing each rule and input tuple are

0.95 and 1.0.

!26

U(L4, L5)N(L4, L5)P(L4, L4)

Marginal Inference

R(L4, L5)

P(L4, L5)A(L4, L5)

Pr(R(L4,L5)) = Pr(R(L4,L5), A(L4,L5), P(L4,L5)) = Pr(R(L4,L5) | A(L4,L5), P(L4,L5)) * Pr(A(L4,L5)) * Pr(P(L4,L5)) = 0.95 * 1.0 * Pr(P(L4,L5)) = 0.95 * Pr(P(L4,L5), Pr(P(L4,L4)), Pr(N(L4,L5), Pr(U(L4,L5)) = 0.95 * Pr(P(L4,L5) | Pr(P(L4,L4)), Pr(N(L4,L5), Pr(U(L4,L5)) * Pr(P(L4,L4)) * Pr(N(L4,L5)) * Pr(U(L4,L5))

By Bayes’s Rule: Pr(A,B) = Pr(A|B) * Pr(B)

!27

U(L4, L5)N(L4, L5)P(L4, L4)

Marginal Inference

R(L4, L5)

P(L4, L5)A(L4, L5)

Pr(R(L4,L5)) = Pr(R(L4,L5), A(L4,L5), P(L4,L5)) = Pr(R(L4,L5) | A(L4,L5), P(L4,L5)) * Pr(A(L4,L5)) * Pr(P(L4,L5)) = 0.95 * 1.0 * Pr(P(L4,L5)) = 0.95 * 0.95 * Pr(P(L4,L4)) * Pr(N(L4,L5) * Pr(U(L4,L5)) = … = 0.398

!28

U(L4, L5)N(L4, L5)P(L4, L4)

Alarm Ranking

Ranking Alarm Confidence

1 R(L4, L5) 0.398

2 R(L5, L5) 0.378

3 R(L6, L7) 0.324

4 R(L7, L7) 0.308

5 R(L0, L7) 0.279

public class RequestHandler { private FtpRequest request;

public FtpRequest getRequest() { return request;

}

public void close() { synchronized (this) { if (isClosed) return; isClosed = true;

} controlSocket.close();controlSocket = null; request.clear();request = null;

} }

//L0

//L1 //L2 //L3

//L4 //L5 //L6 //L7

!29

Alarm Ranking

Ranking Alarm Confidence

1 R(L4, L5) 0.398

2 R(L5, L5) 0.378

3 R(L6, L7) 0.324

4 R(L7, L7) 0.308

5 R(L0, L7) 0.279

public class RequestHandler { private FtpRequest request;

public FtpRequest getRequest() { return request;

}

public void close() { synchronized (this) { if (isClosed) return; isClosed = true;

} controlSocket.close();controlSocket = null; request.clear();request = null;

} }

//L0

//L1 //L2 //L3

//L4 //L5 //L6 //L7

Q: What are the probabilities of the other alarms when R(L4,L5) is false?

!30

Marginal Inference

R(L6, L7)

A(L6, L7) P(L6, L7)

N(L6, L7) P(L6, L6)U(L6, L7)

N(L5, L6) P(L6, L5)U(L6, L6)

N(L4, L5) P(L6, L4)U(L6, L5)

P(L4, L6)

U(L4, L6)N(L5, L6)

R(L4, L5)

P(L4, L5)A(L4, L5)

U(L4, L5)N(L4, L5)P(L4, L4) Pr(P(L4,L5) | ¬R(L4,L5)) = Pr(¬R(L4,L5) | P(L4,L5)) * Pr(P(L4,L5)) / Pr(¬R(L4,L5))= 0.03

Pr(R(L6,L7) | ¬R(L4,L5)) = Pr(R(L6,L7) | P(L4,L5)) * Pr(P(L4,L5)) | ¬R(L4,L5))= 0.03

By Bayes’s Rule: Pr(A|B) = P(B|A) * Pr(A) / Pr(B)

!31

Alarm Ranking

Ranking Alarm Confidence

1 R(L4, L5) 0.398

2 R(L5, L5) 0.378

3 R(L6, L7) 0.324

4 R(L7, L7) 0.308

5 R(L0, L7) 0.279

Ranking Alarm Confidence

1 R(L0, L7) 0.279

2 R(L5, L5) 0.035

3 R(L6, L7) 0.030

4 R(L7, L7) 0.028

5 R(L4, L5) 0

!32

Experimental ResultsDatarace Analysis

0%

25%

50%

75%

100%

hedc ftp weblech jspider avrora luindex sunflow xalan

Bug Bingo Total!33

152 522 30 257 978 940 958 1870

Experimental ResultsDatarace Analysis

0%

25%

50%

75%

100%

hedc ftp weblech jspider avrora luindex sunflow xalan

Bug Bingo Total!34

152 522 30 257 978 940 958 1870Only 30% of alarms to discover all bugs

Experimental ResultsInformation Flow Analysis

0%

25%

50%

75%

100%

app-324 noisy app-ca7 app-kQm tilt andors ginger app-018

Bug Bingo Total!35

110 212 393 817 352 156 437 420

Experimental ResultsInformation Flow Analysis

0%

25%

50%

75%

100%

app-324 noisy app-ca7 app-kQm tilt andors ginger app-018

Bug Bingo Total!36

110 212 393 817 352 156 437 420Only 50% of alarms to discover all bugs

Future Work

!37

path(1,7)edge(7,2) edge(7,5)

path(1,2) path(1,5) edge(5,8)

path(1,8)edge(8,3) edge(8,6)

path(1,3) path(1,6)

edge(1,7)path(1,1)

f(){ v1 = new ...v2 = id1(v1)v3 = id2(v2)assert(v3!=v1) q1}id1(v){ return v }

g(){ v4 = new ...v5 = id1(v4)v6 = id2(v5)assert(v6!=v1) q2}id2(v){ return v }

A B C ㄱC

T T 0.95 0.05

T F 0.94 0.06

F T 0.29 0.71

F F 0.01 0.99

B A ㄱA

T 0.52 0.48

F 0.01 0.99

B ㄱB

0.04 0.96

A

C

B

1. Generalizing to non-datalog static analyses

2. Transferring the learned knowledge to other programs

3. Optimizing the marginal inference solver

4. Designing more fine-grained interaction models

1 23

4

Conclusion

• First interactive alarm ranking system

• Logical + probabilistic reasoning using Bayesian network

• Hope to build AI-guided static analysis system

!38

Conclusion

Thank You

!39

• First interactive alarm ranking system

• Logical + probabilistic reasoning using Bayesian network

• Hope to build AI-guided static analysis system

top related