risk-based attack surface approximation: how much data is enough? [icse - seip 2017]

28
Risk-Based Attack Surface Approximation: How Much Data is Enough? Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams North Carolina State University Microsoft Research

Upload: chris-theisen

Post on 16-Mar-2018

62 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Risk-Based Attack Surface Approximation:

How Much Data is Enough?

Chris Theisen, Brendan Murphy, Kim Herzig, Laurie Williams

North Carolina State University

Microsoft Research

Page 2: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]
Page 3: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Introduction

What is the “Attack Surface”? Quoting the Open Web Application

Security Project…

• All paths for data and commands in a software system

• The data that travels these paths

• The code that implements and protects both

Concept used for security effort prioritization.

3Introduction | Background | Methodology | Results | Conclusion

Page 4: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

4

Crashes represent activity that put the system under

stress.

Stack Traces tell us what happened.

foo!foobarDeviceQueueRequest+0x68

foo!fooDeviceSetup+0x72

foo!fooAllDone+0xA8

bar!barDeviceQueueRequest+0xB6

bar!barDeviceSetup+0x08

bar!barAllDone+0xFF

center!processAction+0x1034

center!dontDoAnything+0x1030

Risk-Based Attack Surface Approximation

(RASA)

Introduction | Background | Methodology | Results | Conclusion

Page 5: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

• Previous RASA study used tens of millions of crashes.

• Previous study was per binary.

Previously…

5

[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in

Companion Proceedings of the 37th International Conference on Software Engineering (2015).

[SEIP ‘15] Crashes

%binaries 48.4%

%vulnerabilities 94.6%

Introduction | Background | Methodology | Results | Conclusion

Page 6: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

• Previous RASA study used tens of millions of crashes.

• Previous study was per binary.

Previously…

6

[SEIP ‘15] Chris Theisen, Kim Herzig, Pat Morrison, Brendan Murphy, and Laurie Williams, “Approximating Attack Surfaces with Stack Traces”, in

Companion Proceedings of the 37th International Conference on Software Engineering (2015).

[SEIP ‘15] Crashes

%binaries 48.4%

%vulnerabilities 94.6%

Great! All done, right?

Introduction | Background | Methodology | Results | Conclusion

Page 7: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Practitioner Problems

• Previous RASA study used tens of millions of crashes.

• Previous study was per binary.

7Introduction | Background | Methodology | Results | Conclusion

Page 8: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Practitioner Problems

• Previous RASA study used tens of millions of crashes.

• Previous study was per binary.

• Practitioners had some issues with it…

– “Binary prioritization isn’t actionable.”

8Introduction | Background | Methodology | Results | Conclusion

Page 9: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Practitioner Problems

• Previous RASA study used tens of millions of crashes.

• Previous study was per binary.

• Practitioners had some issues with it…

– “Binary prioritization isn’t actionable.”

– “We don’t have that much data!”

9Introduction | Background | Methodology | Results | Conclusion

Page 10: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Practitioner Problems

• Previous RASA study used tens of millions of crashes.

• Previous study was per binary.

• Practitioners had some issues with it…

– “Binary prioritization isn’t actionable.”

– “We don’t have that much data!”

– “We don’t store every crash we received, we don’t

see the value in that.”

10Introduction | Background | Methodology | Results | Conclusion

Page 11: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Practitioner Problems

• Previous RASA study used tens of millions of crashes.

• Previous study was per binary.

• Practitioners had some issues with it…

– “Binary prioritization isn’t actionable.”

– “We don’t have that much data!”

– “We don’t store every crash we received, we don’t

see the value in that.”

– “We don’t have historical vulnerabilities to use as a

goodness measure.”

11Introduction | Background | Methodology | Results | Conclusion

Page 12: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Research Questions

• RQ1: Can the RASA approach be implemented at the

source code file level with actionable results?

• RQ2: How does random sampling of crash dump stack

traces effect RASA?

12Introduction | Background | Methodology | Results | Conclusion

Page 13: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Data Sources

• Mozilla Firefox

– ~1M crashes

– Vulnerability data from Mozilla Security

Blog and bug tracker

• Windows 8.1

– ~9M crashes

– Vulnerability data from internal data

sources

13Introduction | Background | Methodology | Results | Conclusion

Page 14: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Methodology - RASA

14Introduction | Background | Methodology | Results | Conclusion

Page 15: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Methodology - RASA

15Introduction | Background | Methodology | Results | Conclusion

Page 16: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Methodology - RASA

16Introduction | Background | Methodology | Results | Conclusion

Page 17: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Methodology - Sampling

17

10% of…

Introduction | Background | Methodology | Results | Conclusion

Page 18: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Methodology - Sampling

18

10% of…20% of…

Introduction | Background | Methodology | Results | Conclusion

Page 19: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Methodology - Sampling

19

10% of…20% of…

• Sample at each “level”

• Record stdev of files,

vulnerabilities covered

Introduction | Background | Methodology | Results | Conclusion

Page 20: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

20

12%

13%

14%

15%

16%

17%

70%

71%

72%

73%

74%

75%

Random Sample Size

Introduction | Background | Methodology | Results | Conclusion

Files

Vulnerabilities

Page 21: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

10%

12%

14%

16%

18%

20%

22%

24%

26%

30%

32%

34%

36%

38%

40%

42%

44%

46%

Random Sample Size

21Introduction | Background | Methodology | Results | Conclusion

Files

Vulnerabilities

Page 22: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Why Does Sampling Work?

• Crashes tend not to happen in isolation.

– If something crashes once, it will likely crash again.

• For Firefox, only 6 files in the data set with a vulnerability

had only one crash occurrence.

– Against ~300 vulnerable files, 50,000 total files

• If foo.cpp crashes many times, random sampling unlikely

to remove all foo.cpp’s from the dataset.

22Introduction | Background | Methodology | Results | Conclusion

Page 23: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Future Work

• We have a list of vulnerable files; now what?

– Further prioritization to assist developers.

• We’re looking at:

– How the attack surface changes over time.

– How the complexity of the attack surface predicts

vulnerabilities.

– How proximity to the boundary of a software

system predicts vulnerabilities.

23Introduction | Background | Methodology | Results | Conclusion

Page 24: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Conclusions

• “Binary prioritization isn’t actionable.”

– RASA can prioritize security effort effectively at the

source code file level.

24Introduction | Background | Methodology | Results | Conclusion

Page 25: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Conclusions

• “Binary prioritization isn’t actionable.”

– RASA can prioritize security effort effectively at the

source code file level.

• “We don’t have that much data!”

– Orders of magnitude less data required compared

to previous studies.

25Introduction | Background | Methodology | Results | Conclusion

Page 26: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Conclusions

• “We don’t store every crash we received, we don’t see

the value in that.”

– A naïve approach like random sampling still works.

26Introduction | Background | Methodology | Results | Conclusion

Page 27: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

Conclusions

• “We don’t store every crash we received, we don’t see

the value in that.”

– A naïve approach like random sampling still works.

• “We don’t have historical vulnerabilities to use as a

goodness measure.”

– Satisfied previous complaints with less data, naïve

sampling; evidence it will work on new systems.

27Introduction | Background | Methodology | Results | Conclusion

Page 28: Risk-Based Attack Surface Approximation: How Much Data is Enough? [ICSE - SEIP 2017]

28

foo!foobarDeviceQueueRequest+0x68

foo!fooDeviceSetup+0x72

foo!fooAllDone+0xA8

bar!barDeviceQueueRequest+0xB6

bar!barDeviceSetup+0x08

bar!barAllDone+0xFF

[email protected]

@theisencr

theisencr.github.io

Expected Graduation: May 2018Data Science, Security Analytics,

Security Education