risk-based attack surface approximation: how much data is enough? [icse - seip 2017]

Risk-Based Attack Surface Approximation:

How Much Data is Enough?

Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams

North Carolina State University

Microsoft Research

Introduction

What is the “Attack Surface”? Quoting the Open Web Application

Security Project…

• All paths for data and commands in a software system

• The data that travels these paths

• The code that implements and protects both

Concept used for security effort prioritization.

3Introduction | Background | Methodology | Results | Conclusion

4

Crashes represent activity that put the system under

stress.

Stack Traces tell us what happened.

foo!foobarDeviceQueueRequest+0x68

foo!fooDeviceSetup+0x72

foo!fooAllDone+0xA8

bar!barDeviceQueueRequest+0xB6

bar!barDeviceSetup+0x08

bar!barAllDone+0xFF

center!processAction+0x1034

center!dontDoAnything+0x1030

Risk-Based Attack Surface Approximation

(RASA)

Introduction | Background | Methodology | Results | Conclusion

• Previous RASA study used tens of millions of crashes.

• Previous study was per binary.

Previously…

5

[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in

Companion Proceedings of the 37th International Conference on Software Engineering (2015).

[SEIP ‘15] Crashes

%binaries 48.4%

%vulnerabilities 94.6%




Previously…

6

[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in

Companion Proceedings of the 37th International Conference on Software Engineering (2015).

[SEIP ‘15] Crashes

%binaries 48.4%

%vulnerabilities 94.6%

Great! All done, right?


Practitioner Problems







• Practitioners had some issues with it…

– “Binary prioritization isn’t actionable.”







– “We don’t have that much data!”








– “We don’t store every crash we received, we don’t

see the value in that.”








– “We don’t store every crash we received, we don’t

see the value in that.”

– “We don’t have historical vulnerabilities to use as a

goodness measure.”


Research Questions

• RQ1: Can the RASA approach be implemented at the

source code file level with actionable results?

• RQ2: How does random sampling of crash dump stack

traces effect RASA?


Data Sources

• Mozilla Firefox

– ~1M crashes

– Vulnerability data from Mozilla Security

Blog and bug tracker

• Windows 8.1

– ~9M crashes

– Vulnerability data from internal data

sources


Methodology - RASA


Methodology - Sampling

17

10% of…



18

10% of…20% of…



19

10% of…20% of…

• Sample at each “level”

• Record stdev of files,

vulnerabilities covered


20

12%

13%

14%

15%

16%

17%

70%

71%

72%

73%

74%

75%

Random Sample Size


Files

Vulnerabilities

10%

12%

14%

16%

18%

20%

22%

24%

26%

30%

32%

34%

36%

38%

40%

42%

44%

46%

Random Sample Size


Files

Vulnerabilities

Why Does Sampling Work?

• Crashes tend not to happen in isolation.

– If something crashes once, it will likely crash again.

• For Firefox, only 6 files in the data set with a vulnerability

had only one crash occurrence.

– Against ~300 vulnerable files, 50,000 total files

• If foo.cpp crashes many times, random sampling unlikely

to remove all foo.cpp’s from the dataset.


Future Work

• We have a list of vulnerable files; now what?

– Further prioritization to assist developers.

• We’re looking at:

– How the attack surface changes over time.

– How the complexity of the attack surface predicts

vulnerabilities.

– How proximity to the boundary of a software

system predicts vulnerabilities.


Conclusions

• “Binary prioritization isn’t actionable.”

– RASA can prioritize security effort effectively at the

source code file level.


Conclusions

• “Binary prioritization isn’t actionable.”

– RASA can prioritize security effort effectively at the

source code file level.

• “We don’t have that much data!”

– Orders of magnitude less data required compared

to previous studies.


Conclusions

• “We don’t store every crash we received, we don’t see

the value in that.”

– A naïve approach like random sampling still works.


Conclusions

• “We don’t store every crash we received, we don’t see

the value in that.”

– A naïve approach like random sampling still works.

• “We don’t have historical vulnerabilities to use as a

goodness measure.”

– Satisfied previous complaints with less data, naïve

sampling; evidence it will work on new systems.


28

foo!foobarDeviceQueueRequest+0x68

foo!fooDeviceSetup+0x72

foo!fooAllDone+0xA8

bar!barDeviceQueueRequest+0xB6

bar!barDeviceSetup+0x08

bar!barAllDone+0xFF

[email protected]

@theisencr

theisencr.github.io

Expected Graduation: May 2018Data Science, Security Analytics,

Security Education

risk-based attack surface approximation: how much data is enough? [icse - seip 2017]

Data & Analytics