sampling dynamic dataflow analyses joseph l. greathouse advanced computer architecture laboratory...
DESCRIPTION
Simple Buffer Overflow Example 3 void foo() { int local_variables; int buffer[256]; … buffer = read_input(); … return; } Return address Local variables buffer Buffer Fill New Return address Bad Local variables If read_input() reads 200 ints If read_input() reads >256 ints bufferTRANSCRIPT
![Page 1: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/1.jpg)
Sampling Dynamic Dataflow Analyses
Joseph L. GreathouseAdvanced Computer Architecture Laboratory
University of Michigan
University of British ColumbiaJune 10, 2011
![Page 2: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/2.jpg)
2
NIST: SW errors cost U.S. ~$60 billion/year as of 2002 FBI CCS: Security Issues $67 billion/year as of 2005
>⅓ from viruses, network intrusion, etc.
Software Errors Abound
2000 2001 2002 2003 2004 2005 2006 2007 2008 (proj.)
0
3,000
6,000
9,000CERT-Cataloged Vulnerabilities
![Page 3: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/3.jpg)
3
Simple Buffer Overflow Examplevoid foo(){ int local_variables; int buffer[256]; … buffer = read_input(); … return;}
Return address
Local variables
buffer
Buffer Fill
New Return address
Bad Local variables
If read_input() reads 200 ints If read_input() reads >256 ints
buffer
![Page 4: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/4.jpg)
4
Concurrency Bugs Also Matter
if(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len);}
Thread 2mylen=large
Nov. 2010 OpenSSL Security Flaw
Thread 1mylen=small
![Page 5: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/5.jpg)
5
Concurrency Bugs Also Matter
if(ptr==NULL)if(ptr==NULL)
memcpy(ptr, data2, len2)
ptrLEAKED
TIM
E
Thread 2mylen=large
Thread 1mylen=small
∅
len2=thread_local->mylen;ptr=malloc(len2);
len1=thread_local->mylen;ptr=malloc(len1);
memcpy(ptr, data1, len1)
![Page 6: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/6.jpg)
6
One Layer of a Solution High quality dynamic software analysis
Find difficult bugs that other analyses miss
Distribute Tests to Large Populations Low overhead or users get angry
Accomplished by sampling the analyses Each user only tests part of the program
![Page 7: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/7.jpg)
7
Dynamic Dataflow Analysis
Associate meta-data with program values
Propagate/Clear meta-data while executing
Check meta-data for safety & correctness
Forms dataflows of meta/shadow information
![Page 8: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/8.jpg)
8
a += y z = y * 75
y = x * 1024 w = x + 42 Check wCheck w
Example Dynamic Dataflow Analysis
validate(x)x = read_input() Clear
a += y z = y * 75
y = x * 1024
x = read_input()
Propagate
Associate
Input
Check aCheck a
Check zCheck z
Data
Meta-data
![Page 9: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/9.jpg)
9
Distributed Dynamic Dataflow Analysis Split analysis across large populations
Observe more runtime states Report problems developer never thought to test
Instrumented Program Potential
problems
![Page 10: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/10.jpg)
10
Taint Analysis(e.g.TaintCheck)
Dynamic Bounds Checking
FP Accuracy Verification
Symbolic Execution
Data Race Detection(e.g. Helgrind)
Memory Checking(e.g. Dr. Memory)
Problem: DDAs are Slow
10-200x2-200x
10-80x
5-50x100-500x
2-300x
![Page 11: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/11.jpg)
One Solution: Sampling
11
Lower overheads by skipping some analyses
0
25
50
75
100
Overhead
Idea
lD
etec
tion
Acc
urac
y (%
)
CompleteAnalysis
NoAnalysis
![Page 12: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/12.jpg)
Sampling Allows Distribution
12
0
25
50
75
100
Overhead
Idea
lD
etec
tion
Acc
urac
y (%
)
Developer
Beta TestersEnd Users
Many users testing at little overhead see more errors than
one user at high overhead.
![Page 13: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/13.jpg)
13
a += y z = y * 75
y = x * 1024 w = x + 42
Cannot Naïvely Sample Code
Validate(x)x = read_input()
a += y
y = x * 1024
False Positive
False Negative
w = x + 42
validate(x)x = read_input() Skip Instr.
Input
Check wCheck w
Check zCheck zSkip Instr.
Check aCheck a
![Page 14: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/14.jpg)
14
Sampling must be aware of meta-data
Remove meta-data from skipped dataflows Prevents false positives
Sample Data, not Code
![Page 15: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/15.jpg)
15
a += y z = y * 75
y = x * 1024 w = x + 42
Dataflow Sampling Example
validate(x)x = read_input()
a += y
y = x * 1024
False Negative
x = read_input() Skip Dataflow
Input
Check wCheck w
Check zCheck z
Skip Dataflow
Check aCheck a
![Page 16: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/16.jpg)
16
Mechanisms for Dataflow Sampling (1) Start with demand analysis
Demand Analysis Tool
NativeApplication
InstrumentedApplication
(e.g. Valgrind)
InstrumentedApplication
(e.g. Valgrind)
Shadow Value DetectorMeta-data
No meta-data
![Page 17: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/17.jpg)
17
Mechanisms for Dataflow Sampling (2) Remove dataflows if execution is too slow
Sampling Analysis Tool
InstrumentedApplication
NativeApplication
InstrumentedApplication
Shadow Value DetectorMeta-data
Clear meta-dataOH Threshold
![Page 18: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/18.jpg)
18
Prototype Setup Taint analysis sampling system
Network packets untrusted Xen-based demand analysis
Whole-system analysis with modified QEMU Overhead Manager (OHM) is user-controlled
Xen Hypervisor
OS and Applications
App App App…
Linux
ShadowPage Table
Admin VM
Taint Analysis QEMU
Net Stack OHM
![Page 19: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/19.jpg)
19
Benchmarks Performance – Network Throughput
Example: ssh_receive Accuracy of Sampling Analysis
Real-world Security Exploits
Name Error DescriptionApache Stack overflow in Apache Tomcat JK ConnectorEggdrop Stack overflow in Eggdrop IRC botLynx Stack overflow in Lynx web browserProFTPD Heap smashing attack on ProFTPD ServerSquid Heap smashing attack on Squid proxy server
![Page 20: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/20.jpg)
20
ssh_receive
Performance of Dataflow Sampling (1)
0 20 40 60 80 1000
5
10
15
20
25
Maximum % Time in Analysis
Thro
ughp
ut (M
B/s
) Throughput with no analysis
![Page 21: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/21.jpg)
21
netcat_receive
Performance of Dataflow Sampling (2)
0 20 40 60 80 1000
10
20
30
40
50
60
Maximum % Time in Analysis
Thro
ughp
ut (M
B/s
)
Throughput with no analysis
![Page 22: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/22.jpg)
22
ssh_transmit
Performance of Dataflow Sampling (3)
0 20 40 60 80 1000
5
10
15
20
25
Maximum % Time in Analysis
Thro
ughp
ut (M
B/s
)
Throughput with no analysis
![Page 23: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/23.jpg)
23
Accuracy at Very Low Overhead Max time in analysis: 1% every 10 seconds Always stop analysis after threshold
Lowest probability of detecting exploits
Name Chance of Detecting ExploitApache 100%Eggdrop 100%Lynx 100%ProFTPD 100%Squid 100%
![Page 24: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/24.jpg)
24
netcat_receive running with benchmark
Accuracy with Background Tasks
10% 25% 50% 75% 90%0
20
40
60
80
100
0.38 0.4
ApacheEggdropLynxProFTPDSquid
Maximum Allowed Overhead %
% C
hanc
e of
Det
ectin
g E
xplo
it
![Page 25: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/25.jpg)
25
Accuracy with Background Tasks ssh_receive running in background
10% 25% 50% 75% 90%0
20
40
60
80
100
0.1 0.7 2
ApacheEggdropLynxProFTPDSquid
Maximum % Time in Analysis
% C
hanc
e of
Det
ectin
g E
xplo
it
![Page 26: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/26.jpg)
26
Conclusion & Future WorkDynamic dataflow sampling gives users aknob to control accuracy vs. performance
Better methods of sample choices Combine static information New types of sampling analysis
![Page 27: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/27.jpg)
27
BACKUP SLIDES
![Page 28: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/28.jpg)
28
Outline
Software Errors and Security Dynamic Dataflow Analysis Sampling and Distributed Analysis Prototype System Performance and Accuracy
![Page 29: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/29.jpg)
29
Detecting Security Errors Static Analysis
Analyze source, formal reasoning+ Find all reachable, defined errors– Intractable, requires expert input,
no system state Dynamic Analysis
Observe and test runtime state+ Find deep errors as they happen– Only along traversed path,
very slow
![Page 30: Sampling Dynamic Dataflow Analyses Joseph L. Greathouse Advanced Computer Architecture Laboratory University of Michigan University of British Columbia](https://reader036.vdocuments.us/reader036/viewer/2022081605/5a4d1af27f8b9ab05997f18f/html5/thumbnails/30.jpg)
30
Width Test