specification mining with few false positivesclegoues/docs/slides/tacas_09_presentation.pdf ·...
TRANSCRIPT
![Page 1: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/1.jpg)
Specification Mining With Few False Positives
Claire Le Goues Westley Weimer University of Virginia March 25, 2009
1
![Page 2: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/2.jpg)
Slide 0.5: Hypothesis
We can use measurements of the “trustworthiness” of source code to mine specifications with few
false positives.
2
![Page 3: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/3.jpg)
Slide 0.5: Hypothesis
We can use measurements of the “trustworthiness” of source code to mine specifications with few
false positives.
3
![Page 4: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/4.jpg)
Slide 0.5: Hypothesis
We can use measurements of the “trustworthiness” of source code to mine specifications with few
false positives.
4
![Page 5: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/5.jpg)
Slide 0.5: Hypothesis
We can use measurements of the “trustworthiness” of source code to mine specifications with few
false positives.
5
![Page 6: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/6.jpg)
Outline
• Motivation: Specifications • Problem: Specification Mining • Solution: Trustworthiness • Evaluation: 3 Experiments • Conclusions
6
![Page 7: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/7.jpg)
7
![Page 8: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/8.jpg)
Why Specifications?
• Modifying code, correcting defects, and evolving code account for as much as 90% of the total cost of software projects.
• Up to 60% of maintenance time is spent studying existing software.
• Specifications are useful for debugging, testing, maintaining, refactoring, and documenting software.
8
![Page 9: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/9.jpg)
Our Definition (Broadly)
A specification is a formal description of some aspect of legal program behavior.
9
![Page 10: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/10.jpg)
What kind of specification?
• We would like specifications that are simple and machine-readable
• We focus on partial-correctness specifications describing temporal properties ▫ Describes legal sequences of events, where an
event is a function call; similar to an API. • Two-state finite state machines
10
![Page 11: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/11.jpg)
Example Specification
11
Event A: Mutex.lock() Event B: Mutex.unlock()
![Page 12: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/12.jpg)
Example: Locks
1 2 1
12
Mutex.lock()
Mutex.unlock()
![Page 13: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/13.jpg)
Our Specifications
• For the sake of this work, we are talking about this type of two-state temporal specifications.
• These specifications correspond to the regular expression (ab)* ▫ More complicated patterns are possible.
13
![Page 14: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/14.jpg)
14
![Page 15: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/15.jpg)
Where do formal specifications come from? • Formal specifications are useful, but there
aren’t as many as we would like. • We use specification mining to
automatically derive the specifications from the program itself.
15
![Page 16: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/16.jpg)
Mining 2-state Temporal Specifications
• Input: program traces – a sequence of events that can take place as the program runs. ▫ Consider pairs of events that meet certain criteria. ▫ Use statistics to figure out which ones are likely
true specifications. • Output: ranked set of candidate specifications,
presented to a programmer for review and validation.
16
![Page 17: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/17.jpg)
Problem: False Positives Are Common
Event A: Iterator.hasNext() Event B: Iterator.next()
17
• This is very common behavior. • This is not required behavior. ▫ Iterator.hasNext() does not have to be followed
eventually by Iterator.next() in order for the code to be correct.
• This candidate specification is a false positive.
![Page 18: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/18.jpg)
Previous Work Benchmark LOC Candidate Specs False Positive Rate
Infinity 28K 10 90%
Hibernate 57K 51 82%
Axion 65K 25 68%
Hsqldb 71K 62 89%
Cayenne 86K 35 86%
Sablecc 99K 4 100%
Jboss 107K 114 90%
Mckoi-sql 118K 156 88%
Ptolemy2 362K 192 95%
* A
dapt
ed fr
om W
eim
er-N
ecul
a, T
AC
AS
2005
18
![Page 19: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/19.jpg)
Previous Work Benchmark LOC Candidate Specs False Positive Rate
Infinity 28K 10 90%
Hibernate 57K 51 82%
Axion 65K 25 68%
Hsqldb 71K 62 89%
Cayenne 86K 35 86%
Sablecc 99K 4 100%
Jboss 107K 114 90%
Mckoi-sql 118K 156 88%
Ptolemy2 362K 192 95%
* A
dapt
ed fr
om W
eim
er-N
ecul
a, T
AC
AS
2005
19
![Page 20: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/20.jpg)
20
![Page 21: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/21.jpg)
The Problem (as we see it)
• Let’s pretend we’d like to learn the rules of English grammar.
• …but all we have is a stack of high school English papers.
• Previous miners ignore the differences between A papers and F papers.
• Previous miners treat all traces as though they were all equally indicative of correct program behavior.
21
![Page 22: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/22.jpg)
Solution: Code Trustworthiness
• Trustworthy code is unlikely to exhibit API policy violations.
• Candidate specifications derived from trustworthy code are more likely to be true specifications.
22
![Page 23: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/23.jpg)
What is trustworthy code?
Informally… • Code that hasn’t been changed recently • Code that was written by trustworthy developers • Code that hasn’t been cut and pasted all over the
place • Code that is readable • Code that is well-tested • And so on.
23
![Page 24: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/24.jpg)
Can you firm that up a bit?
• Multiple surface-level, textual, and semantic features can reveal the trustworthiness of code ▫ Churn, author rank, copy-paste development,
readability, frequency, feasibility, density, and others.
• Our miner should believe that lock() – unlock() is a specification if it is often followed on trustworthy traces and often violated on untrustworthy ones.
24
![Page 25: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/25.jpg)
A New Miner
• Statically estimate the trustworthiness of each code fragment.
• Lift that judgment to program traces by considering the code visited along the trace.
• Weight the contribution of each trace by its trustworthiness when counting event frequencies while mining.
25
![Page 26: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/26.jpg)
Incorporating Trustworthiness
• We use linear regression on a set of previously published specifications to learn good weights for the different trustworthiness factors.
• Different weights yield different miners.
26
![Page 27: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/27.jpg)
27
![Page 28: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/28.jpg)
Experimental Questions
• Can we use trustworthiness metrics to build a miner that finds useful specifications with few false positives?
• Which trustworthiness metrics are the most useful in finding specifications?
• Do our ideas about trustworthiness generalize?
28
![Page 29: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/29.jpg)
Experimental Questions
• Can we use trustworthiness metrics to build a miner that finds useful specifications with few false positives?
• Which trustworthiness metrics are the most useful in finding specifications?
• Do our ideas about trustworthiness generalize?
29
![Page 30: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/30.jpg)
Experimental Setup: Some Definitions
• False positive: an event pair that appears in the candidate list, but a program trace may contain only event A and still be correct.
• Our normal miner balances true positives and false positives (maximizes F-measure)
• Our precise miner avoids false positives (maximizes precision)
30
![Page 31: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/31.jpg)
Experiment 1: A New Miner Normal Miner Precise Miner WN
Program Fal
se
Vio
lati
ons
Fal
se
Vio
lati
ons
Fal
se
Vio
lati
ons
Hibernate 53% 279 17% 153 82% 93
Axion 42% 71 0% 52 68% 45
Hsqldb 25% 36 0% 5 89% 35
jboss 84% 255 0% 12 90% 94
Cayenne 58% 45 0% 23 86% 18
Mckoi-sql 59% 20 0% 7 88% 69
ptolemy 14% 44 0% 13 95% 72
Total 69% 740 5% 265 89% 426
On this dataset: • Our normal miner produces 107 false positive specifications. • Our precise miner produces 1 • The previous work produces 567.
31
![Page 32: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/32.jpg)
More Thoughts On Experiment 1
• Our normal miner improves on the false positive rate of previous miners by 20%.
• Our precise miner offers an order-of-magnitude improvement on the false positive rate of previous work.
• We find specifications that are more useful in terms of bug finding: we find 15 bugs per mined specification, where previous work only found 7.
• In other words: we find useful specifications with fewer false positives.
32
![Page 33: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/33.jpg)
Experimental Questions
• Can we use trustworthiness metrics to build a miner that finds useful specifications with few false positives?
• Which trustworthiness metrics are the most useful in finding specifications?
• Do our ideas about trustworthiness generalize?
33
![Page 34: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/34.jpg)
Experiment 2: Metric Importance
• Results of an analysis of variance (ANOVA). • Shows the importance of the trustworthiness metrics. • F is the predictive power (1.0 means no power). • p is the probability that it had no effect (smaller is better).
Metric F p
Frequency 32.3 0.0000
Copy-Paste 12.4 0.0004
Code Churn 10.2 0.0014
Density 10.4 0.0013
Readability 9.4 0.0021
Feasibility 4.1 0.0423
Author Rank 1.0 0.3284
Exceptional 10.8 0.0000
Dataflow 4.3 0.0000
Same Package 4.0 0.0001
One Error 2.2 0.0288
34
![Page 35: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/35.jpg)
More Thoughts on Experiment 2
• Statically predicted path frequency has the strongest predictive power.
Metric F p
Frequency 32.3 0.0000
Copy-Paste 12.4 0.0004
Code Churn 10.2 0.0014
Density 10.4 0.0013
Readability 9.4 0.0021
Feasibility 4.1 0.0423
Author Rank 1.0 0.3284
Exceptional 10.8 0.0000
Dataflow 4.3 0.0000
Same Package 4.0 0.0001
One Error 2.2 0.0288
35
![Page 36: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/36.jpg)
More Thoughts on Experiment 2
• Statically predicted path frequency has the strongest predictive power. • Author rank has no effect on the model.
Metric F p
Frequency 32.3 0.0000
Copy-Paste 12.4 0.0004
Code Churn 10.2 0.0014
Density 10.4 0.0013
Readability 9.4 0.0021
Feasibility 4.1 0.0423
Author Rank 1.0 0.3284
Exceptional 10.8 0.0000
Dataflow 4.3 0.0000
Same Package 4.0 0.0001
One Error 2.2 0.0288
36
![Page 37: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/37.jpg)
More Thoughts on Experiment 2 Metric F p
Frequency 32.3 0.0000
Copy-Paste 12.4 0.0004
Code Churn 10.2 0.0014
Density 10.4 0.0013
Readability 9.4 0.0021
Feasibility 4.1 0.0423
Author Rank 1.0 0.3284
Exceptional 10.8 0.0000
Dataflow 4.3 0.0000
Same Package 4.0 0.0001
One Error 2.2 0.0288
• Statically predicted path frequency has the strongest predictive power. • Author rank has no effect on the model. • Previous work falls somewhere in the middle.
37
![Page 38: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/38.jpg)
Experimental Questions
• Can we use trustworthiness metrics to build a miner that finds useful specifications with few false positives?
• Which trustworthiness metrics are the most useful in finding specifications?
• Do our ideas about trustworthiness generalize?
38
![Page 39: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/39.jpg)
Experiment 3: Does it generalize?
• Previous work claimed that more input is necessarily better for specification mining.
• We hypothesized that smaller, more trustworthy input sets would yield more accurate output from previously implemented tools.
39
![Page 40: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/40.jpg)
Experiment 3: Generalizing
40
![Page 41: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/41.jpg)
Experiment 3: Generalizing
41
![Page 42: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/42.jpg)
Experiment 3: Generalizing
42
![Page 43: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/43.jpg)
Experiment 3: Generalizing
43
![Page 44: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/44.jpg)
Experiment 3: Generalizing
44
• The top 25% “most trustworthy” traces make for a much more accurate miner; the opposite effect is true for the 25% “least trustworthy” traces. • We can throw out the least trustworthy 40-50% of traces and still find the exact same specifications with a slightly lower false positive rate. • More traces != better, so long as the traces are trustworthy.
![Page 45: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/45.jpg)
Experimental Summary • We can use trustworthiness metrics to Build a Better
Miner: our normal miner improves on the false positive rate of previous work by 20%, our precise miner by an order of magnitude, while still finding useful specifications.
• Statistical techniques show that our notion of trustworthiness contributes significantly to our success.
• We can increase the precision and accuracy of previous techniques by using a trustworthy subset of the input.
45
![Page 46: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/46.jpg)
46
![Page 47: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/47.jpg)
Summary
• Formal specifications are very useful. • The previous work in specification mining yields
too many false positives for industrial practice. • We developed a notion of trustworthiness to
evaluate the likelihood that code adheres to two-state temporal specifications.
47
![Page 48: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/48.jpg)
Conclusion
A specification miner that incorporates notions of code trustworthiness can
mine useful specifications with a much lower false positive rate.
48
![Page 49: Specification Mining With Few False Positivesclegoues/docs/slides/tacas_09_presentation.pdf · Hibernate 57K 51 82% Axion 65K 25 68% Hsqldb 71K 62 89% Cayenne 86K 35 86% Sablecc 99K](https://reader034.vdocuments.us/reader034/viewer/2022042410/5f2705c5fcc1e405e03c179b/html5/thumbnails/49.jpg)
The End (questions?)
49