accelerating dynamic software analyses joseph l. greathouse ph.d. candidate advanced computer...
TRANSCRIPT
![Page 1: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/1.jpg)
Accelerating DynamicSoftware Analyses
Joseph L. Greathouse
Ph.D. Candidate
Advanced Computer Architecture Laboratory
University of Michigan
December 1, 2011
![Page 2: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/2.jpg)
2
NIST: SW errors cost U.S. ~$60 billion/year as of 2002
FBI CCS: Security Issues $67 billion/year as of 2005 >⅓ from viruses, network intrusion, etc.
Software Errors Abound
Cataloged Software Vulnerabilities
2000 2001 2002 2003 2004 2005 2006 2007 20080
3000
6000
9000
CVE Candidates
CERT Vulnerabilities
![Page 3: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/3.jpg)
3
Example of a Modern BugThread 2mylen=large
Thread 1mylen=small
ptr∅
Nov. 2010 OpenSSL Security Flawif(ptr == NULL) { len=thread_local->mylen; ptr=malloc(len); memcpy(ptr, data, len);}
![Page 4: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/4.jpg)
4
Example of a Modern Bug
if(ptr==NULL)
if(ptr==NULL)
memcpy(ptr, data2, len2)
ptrLEAKED
TIM
E
Thread 2mylen=large
Thread 1mylen=small
∅
len2=thread_local->mylen;
ptr=malloc(len2);
len1=thread_local->mylen;
ptr=malloc(len1);
memcpy(ptr, data1, len1)
![Page 5: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/5.jpg)
5
Dynamic Software Analysis Analyze the program as it runs
+ System state, find errors on any executed path– LARGE runtime overheads, only test one path
Developer
Instrumented Program In-House
Test Machine(s)
LONG run timeAnalysis Results
Analysis Instrumentation
Program
![Page 6: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/6.jpg)
6
Taint Analysis(e.g.TaintCheck)
Dynamic Bounds Checking
Data Race Detection(e.g. Inspector XE)
Memory Checking(e.g. MemCheck)
Runtime Overheads: How Large?
2-200x
10-80x5-50x
2-300x
Symbolic Execution
10-200x
![Page 7: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/7.jpg)
7
Outline Problem Statement
Background Information Demand-Driven Dynamic Dataflow Analysis
Proposed Solutions Demand-Driven Data Race Detection Sampling to Cap Maximum Overheads
![Page 8: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/8.jpg)
8
Dynamic Dataflow Analysis
Associate meta-data with program values
Propagate/Clear meta-data while executing
Check meta-data for safety & correctness
Forms dataflows of meta/shadow information
![Page 9: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/9.jpg)
9
a += y z = y * 75
y = x * 1024 w = x + 42 Check wCheck w
Example: Taint Analysis
validate(x)x = read_input() Clear
a += y z = y * 75
y = x * 1024
x = read_input()
Propagate
Associate
Input
Check aCheck a
Check zCheck z
Data
Meta-data
![Page 10: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/10.jpg)
10
Demand-Driven Dataflow Analysis Only Analyze Shadowed Data
NativeApplication
InstrumentedApplication
InstrumentedApplication
Meta-Data Detection
Non-Shadowed
Data
Shadowed Data
No meta-data
![Page 11: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/11.jpg)
11
Finding Meta-Data No additional overhead when no meta-data
Needs hardware support Take a fault when touching shadowed data Solution: Virtual Memory Watchpoints
V→P V→PFAULT
![Page 12: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/12.jpg)
12
Results by Ho et al. lmbench Best Case Results:
Results when everything is tainted:
System Slowdown (normalized)
Taint Analysis 101.7x
On-Demand Taint Analysis 1.98x
netcat_transmit netcat_receive ssh_transmit ssh_receive0
50
100
150
200
Slo
wd
ow
n (
x)
![Page 13: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/13.jpg)
13
Outline Problem Statement
Background Information Demand-Driven Dynamic Dataflow Analysis
Proposed Solutions Demand-Driven Data Race Detection Sampling to Cap Maximum Overheads
![Page 14: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/14.jpg)
14
Software Data Race Detection
Add checks around every memory access
Find inter-thread sharing events
Synchronization between write-shared accesses? No? Data race.
![Page 15: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/15.jpg)
Thread 2mylen=large
Thread 1mylen=small
if(ptr==NULL)
if(ptr==NULL)
len1=thread_local->mylen;
ptr=malloc(len1);
memcpy(ptr, data1, len1)
len2=thread_local->mylen;
ptr=malloc(len2);
memcpy(ptr, data2, len2)
Example of Data Race Detection
15
ptr write-shared?Interleaved Synchronization?
TIM
E
![Page 16: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/16.jpg)
16
SW Race Detection is Slow
hist
ogra
m
linea
r_re
gres
sion
pca
word_
coun
t
Geo
Mea
n
body
track
ferre
t
rayt
race
fluid
anim
ate
x264
dedu
p0
50
100
150
200
250
300
Rac
e D
etec
tor
Slo
wd
ow
n (
x)
Phoenix PARSEC
![Page 17: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/17.jpg)
17
Inter-thread Sharing is What’s Important“Data races ... are failures in programs that access and update shared data in critical sections” – Netzer & Miller, 1992
if(ptr==NULL)
if(ptr==NULL)
len1=thread_local->mylen;
ptr=malloc(len1);
memcpy(ptr, data1, len1)
len2=thread_local->mylen;
ptr=malloc(len2);
memcpy(ptr, data2, len2)
Thread-local dataNO SHARING
Shared dataNO INTER-THREAD SHARING EVENTS
TIM
E
![Page 18: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/18.jpg)
18
Very Little Inter-Thread Sharing
Phoenix PARSEC
hist
ogra
m
kmea
ns
linea
r_re
gres
sion
mat
rix_m
ultip
lypc
a
strin
g_m
atch
word_
coun
t
blac
ksch
oles
body
track
face
sim
freqm
ine
rayt
race
swap
tions
fluid
anim
ate
vips
x264
cann
eal
dedu
p
stre
amclu
ster
0
0.5
1
1.5
2
2.5
3
% W
rite
-Sh
arin
g E
ven
ts
![Page 19: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/19.jpg)
19
Use Demand-Driven Analysis!
Multi-threadedApplication
SoftwareRace Detector
SoftwareRace Detector
Local Access
Inter-thread sharing
Inter-thread Sharing Monitor
![Page 20: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/20.jpg)
20
Finding Inter-thread Sharing Virtual Memory Watchpoints?
– ~100% of accesses cause page faults
Granularity Gap Per-process not per-thread
FAULTFAULT
Inter-Thread Sharing
![Page 21: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/21.jpg)
21
Hardware Sharing Detector Hardware Performance Counters
Intel’s HITM event: W→R Data Sharing
S
M
S
IHITM
Pipeline
Cache
0
0
0
0
Perf. Ctrs
12
1
-1 FAULT
Core 1 Core 2
-
-
-
-
PEBS
Armed
Debug StoreEFLAGS
EIPRegValsMemInfo
PreciseFault
![Page 22: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/22.jpg)
22
Potential Accuracy & Perf. Problems Limitations of Performance Counters
HITM only finds W→R Data Sharing Hardware prefetcher events aren’t counted
Limitations of Cache Events SMT sharing can’t be counted Cache eviction causes missed events False sharing, etc…
PEBS events still go through the kernel
![Page 23: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/23.jpg)
23
Demand-Driven Analysis on Real HW
Execute Instruction
SW Race Detection
Enable Analysis
Disable Analysis
HITMInterrupt?
Sharing Recently?
AnalysisEnabled?
NO
NO
NOYES
YES
YES
![Page 24: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/24.jpg)
24
Performance Increases
hist
ogra
m
linea
r_re
gres
sion
pca
word_
coun
t
Geo
Mea
n
body
track
ferre
t
rayt
race
fluid
anim
ate
x264
dedu
p02468
101214161820
Dem
and
-dri
ven
An
alys
is
Sp
eed
up
(x)
Phoenix PARSEC
51x
![Page 25: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/25.jpg)
25
Demand-Driven Analysis Accuracy
hist
ogra
m
linea
r_re
gres
sion
pca
word_
coun
t
Geo
Mea
n
body
track
ferre
t
rayt
race
fluid
anim
ate
x264
dedu
p02468
101214161820
Dem
and
-dri
ven
An
alys
is
Sp
eed
up
(x)
1/1 2/4 3/3 4/4 3/3 4/4 4/42/4 4/4 4/42/4
Accuracy vs. Continuous Analysis:
97%
![Page 26: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/26.jpg)
26
Outline Problem Statement
Background Information Demand-Driven Dynamic Dataflow Analysis
Proposed Solutions Demand-Driven Data Race Detection Sampling to Cap Maximum Overheads
![Page 27: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/27.jpg)
Reducing Overheads Further: Sampling
27
Lower overheads by skipping some analyses
0
25
50
75
100
Overhead
Ide
al
De
tec
tio
n A
cc
ura
cy
(%
)
CompleteAnalysis
NoAnalysis
![Page 28: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/28.jpg)
Sampling Allows Distribution
28
0
25
50
75
100
Overhead
Ide
al
De
tec
tio
n A
cc
ura
cy
(%
)
Developer
Beta TestersEnd Users
Many users testing at little overhead see more errors than
one user at high overhead.
![Page 29: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/29.jpg)
29
a += y z = y * 75
y = x * 1024 w = x + 42
Cannot Naïvely Sample Code
Validate(x)x = read_input()
a += y
y = x * 1024
False Positive
False Negative
w = x + 42
validate(x)x = read_input() Skip Instr.
Input
Check wCheck w
Check zCheck zSkip Instr.
Check aCheck a
![Page 30: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/30.jpg)
30
Sampling must be aware of meta-data
Remove meta-data from skipped dataflows Prevents false positives
Solution: Sample Data, not Code
![Page 31: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/31.jpg)
31
a += y z = y * 75
y = x * 1024 w = x + 42
Dataflow Sampling Example
validate(x)x = read_input()
a += y
y = x * 1024
False Negative
x = read_input()Skip Dataflow
Input
Check wCheck w
Check zCheck z
Skip Dataflow
Check aCheck a
![Page 32: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/32.jpg)
32
Dataflow Sampling Remove dataflows if execution is too slow
Sampling Analysis Tool
InstrumentedApplication
NativeApplication
InstrumentedApplication
Meta-Data DetectionMeta-data
Clear meta-dataOH Threshold
![Page 33: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/33.jpg)
33
Prototype Setup Taint analysis sampling system
Network packets untrusted Xen-based demand analysis
Whole-system analysis with modified QEMU Overhead Manager (OHM) is user-controlled
Xen Hypervisor
OS and Applications
App App App…
Linux
ShadowPage Table
Admin VM
Taint Analysis QEMU
Net Stack
OHM
![Page 34: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/34.jpg)
34
Benchmarks Performance – Network Throughput
Example: ssh_receive Accuracy of Sampling Analysis
Real-world Security Exploits
Name Error Description
Apache Stack overflow in Apache Tomcat JK Connector
Eggdrop Stack overflow in Eggdrop IRC bot
Lynx Stack overflow in Lynx web browser
ProFTPD Heap smashing attack on ProFTPD Server
Squid Heap smashing attack on Squid proxy server
![Page 35: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/35.jpg)
35
ssh_receive
Performance of Dataflow Sampling
0 20 40 60 80 1000
5
10
15
20
25
Maximum % Time in Analysis
Th
rou
gh
pu
t (M
B/s
) Throughput with no analysis
![Page 36: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/36.jpg)
36
Accuracy with Background Tasks ssh_receive running in background
10% 25% 50% 75% 90%0
20
40
60
80
100
0.1 0.7 2
Apache
Eggdrop
Lynx
ProFTPD
Squid
Maximum % Time in Analysis
% C
ha
nce
of
De
tect
ing
Exp
loit
![Page 37: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/37.jpg)
37
BACKUP SLIDES
![Page 38: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/38.jpg)
38
Performance Difference
hist
ogra
m
linea
r_re
gres
sion
pca
word_
coun
t
Geo
Mea
n
body
track
ferre
t
rayt
race
fluid
anim
ate
x264
dedu
p0
50
100
150
200
250
300
Rac
e D
etec
tor
Slo
wd
ow
n (
x)
Phoenix PARSEC
![Page 39: Accelerating Dynamic Software Analyses Joseph L. Greathouse Ph.D. Candidate Advanced Computer Architecture Laboratory University of Michigan December 1,](https://reader036.vdocuments.us/reader036/viewer/2022062518/56649f4f5503460f94c706d2/html5/thumbnails/39.jpg)
39
Width Test