risk-based attack surface approximation: how much data is enough? [icse - seip 2017]
TRANSCRIPT
Risk-Based Attack Surface Approximation:
How Much Data is Enough?
Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams
North Carolina State University
Microsoft Research
Introduction
What is the “Attack Surface”? Quoting the Open Web Application
Security Project…
• All paths for data and commands in a software system
• The data that travels these paths
• The code that implements and protects both
Concept used for security effort prioritization.
3Introduction | Background | Methodology | Results | Conclusion
4
Crashes represent activity that put the system under
stress.
Stack Traces tell us what happened.
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
center!processAction+0x1034
center!dontDoAnything+0x1030
Risk-Based Attack Surface Approximation
(RASA)
Introduction | Background | Methodology | Results | Conclusion
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
5
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Introduction | Background | Methodology | Results | Conclusion
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
Previously…
6
[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in
Companion Proceedings of the 37th International Conference on Software Engineering (2015).
[SEIP ‘15] Crashes
%binaries 48.4%
%vulnerabilities 94.6%
Great! All done, right?
Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
7Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
8Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
9Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
10Introduction | Background | Methodology | Results | Conclusion
Practitioner Problems
• Previous RASA study used tens of millions of crashes.
• Previous study was per binary.
• Practitioners had some issues with it…
– “Binary prioritization isn’t actionable.”
– “We don’t have that much data!”
– “We don’t store every crash we received, we don’t
see the value in that.”
– “We don’t have historical vulnerabilities to use as a
goodness measure.”
11Introduction | Background | Methodology | Results | Conclusion
Research Questions
• RQ1: Can the RASA approach be implemented at the
source code file level with actionable results?
• RQ2: How does random sampling of crash dump stack
traces effect RASA?
12Introduction | Background | Methodology | Results | Conclusion
Data Sources
• Mozilla Firefox
– ~1M crashes
– Vulnerability data from Mozilla Security
Blog and bug tracker
• Windows 8.1
– ~9M crashes
– Vulnerability data from internal data
sources
13Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
14Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
15Introduction | Background | Methodology | Results | Conclusion
Methodology - RASA
16Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
17
10% of…
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
18
10% of…20% of…
Introduction | Background | Methodology | Results | Conclusion
Methodology - Sampling
19
10% of…20% of…
• Sample at each “level”
• Record stdev of files,
vulnerabilities covered
Introduction | Background | Methodology | Results | Conclusion
20
12%
13%
14%
15%
16%
17%
70%
71%
72%
73%
74%
75%
Random Sample Size
Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
10%
12%
14%
16%
18%
20%
22%
24%
26%
30%
32%
34%
36%
38%
40%
42%
44%
46%
Random Sample Size
21Introduction | Background | Methodology | Results | Conclusion
Files
Vulnerabilities
Why Does Sampling Work?
• Crashes tend not to happen in isolation.
– If something crashes once, it will likely crash again.
• For Firefox, only 6 files in the data set with a vulnerability
had only one crash occurrence.
– Against ~300 vulnerable files, 50,000 total files
• If foo.cpp crashes many times, random sampling unlikely
to remove all foo.cpp’s from the dataset.
22Introduction | Background | Methodology | Results | Conclusion
Future Work
• We have a list of vulnerable files; now what?
– Further prioritization to assist developers.
• We’re looking at:
– How the attack surface changes over time.
– How the complexity of the attack surface predicts
vulnerabilities.
– How proximity to the boundary of a software
system predicts vulnerabilities.
23Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
24Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “Binary prioritization isn’t actionable.”
– RASA can prioritize security effort effectively at the
source code file level.
• “We don’t have that much data!”
– Orders of magnitude less data required compared
to previous studies.
25Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
26Introduction | Background | Methodology | Results | Conclusion
Conclusions
• “We don’t store every crash we received, we don’t see
the value in that.”
– A naïve approach like random sampling still works.
• “We don’t have historical vulnerabilities to use as a
goodness measure.”
– Satisfied previous complaints with less data, naïve
sampling; evidence it will work on new systems.
27Introduction | Background | Methodology | Results | Conclusion
28
foo!foobarDeviceQueueRequest+0x68
foo!fooDeviceSetup+0x72
foo!fooAllDone+0xA8
bar!barDeviceQueueRequest+0xB6
bar!barDeviceSetup+0x08
bar!barAllDone+0xFF
@theisencr
theisencr.github.io
Expected Graduation: May 2018Data Science, Security Analytics,
Security Education