privacy-preserving collaborative network anomaly detection haakon ringberg
TRANSCRIPT
PRIVACY-PRESERVING COLLABORATIVE NETWORK ANOMALY DETECTION
Haakon Ringberg
Unwanted network traffic
Haakon Ringberg
2
Problem Attacks on resources (e.g., DDoS, malware) Lost productivity (e.g., instant messaging) Costs USD billions every year
Goal: detect & diagnose unwanted traffic Scale to large networks by analyzing
summarized data Greater accuracy via collaboration
Protect privacy using cryptography
Network
Challenges with detection
Data volume Some commonly
used algorithms analyze IP packet payload info
Infeasible at edge of large networks
3
Haakon Ringberg
Challenges with detection
Data volume Attacks
deliberately mimic normal traffic e.g., SQL-
injection, application-level DoS1
4
Haakon Ringberg
Network
I’m not sure about Beasty
Let me in!
1[Srivatsa TWEB ’08], 2[Jung WWW ’02]
AnomalyDetector
Challenges with detection
Data volume Attacks deliberately
mimic normal traffic e.g., SQL-injection,
application-level DoS1
Is it a DDoS attack or a flash crowd?2
A single network in isolation may not be able to distinguish
5
Haakon Ringberg1[Srivatsa TWEB ’08], 2[Jung WWW ’02]
Network
CNN.com
FOX.com
Collaborative anomaly detection “Bad guys tend
to be around when bad stuff happens”
6
Haakon Ringberg
I’m just not sure about Beasty :-/
I’m just not sure about Beasty :-/
Collaborative anomaly detection “Bad guys tend
to be around when bad stuff happens”
Targets (victims) could correlate attacks/attackers1
7
Haakon Ringberg
1[Katti IMC ’05], [Allman Hotnets ‘06], [Kannan SRUTI ‘06], [Moore INFOC ‘03]2George W. Bush
Fool us once, shame on you. Fool us, we can’t get fooled again!
“Fool us once, shame on you. Fool us, we can’t get fooled again!”2
CNN.com
FOX.com
Corporations demand privacy
Corporations are reluctant to share sensitive data Legal constraints Competitive
reasons
8
Haakon Ringberg
I don’t want FOX to know my customers
CNN.com
FOX.com
Common practice
Haakon Ringberg
9
AT&T Sprint
Every network for themselves!
• -like system • Greater scalability• Provide as a service
System architecture
Haakon Ringberg
10
AT&T
Sprint
• Collaboration infrastructure• For greater accuracy• Protects privacy
N.B. collaboration could also be
performed between stub
networks
Dissertation Overview
Haakon Ringberg
11
Providing
Technologies
Venue
CollaborationInfrastructure
Privacy of participants and
suspects
Cryptography
SubmittedACM CCS ‘09
Detection at a
single network
Scalable Snort-like IDS
system
Machine Learning
PresentedIEEE Infocom
’09
Collaboration
Effectiveness
Quantifying benefits of
coll.
Analysis of Measureme
nts
To be submitted
Chapter I: scalable signature-based detection at individual networks
Work with at&t labs:• Nick Duffield• Patrick Haffner• Balachander Krishnamurthy
12
Intrusion Detection Systems (IDSes) Protect the edge of a network
Leverage known signatures of traffic e.g., Slammer worm packets contain “MS-SQL” (say) in payload or AOL IM packets use specific TCP ports and application
headers
13
IP header
TCP header
App header
Payload
Background: packet & rule IDSes
Enterprise
A predicate is a boolean function on a packet feature e.g., TCP port = 80
A signature (or rule) is a set of predicates
Leverage existing community Many rules already exist CERT, SANS Institute, etc
Classification “for free”
Accurate (?)
Benefits
14
Background: packet and rule IDSes
Background: packet and rule IDSes
Too many packets per second
Packet inspection at the edge requires deployment at many interfaces
Drawbacks
15
A predicate is a boolean function on a packet feature e.g., TCP port = 80
A signature (or rule) is a set of predicates
Network
Too many packets per second
Packet inspection at the edge requires deployment at many interfaces
DPI (deep-packet inspection) predicates can be computationally expensive
Drawbacks
16
Packet has:• Port number X, Y, or Z• Contains pattern “foo” within the first 20 bytes• Contains pattern “bar” within the last 40 bytes
A predicate is a boolean function on a packet feature e.g., TCP port = 80
A signature (or rule) is a set of predicates
Background: packet and rule IDSes
src IP
dst IP
src Por
t
dst Por
t
Duratio
n
# Packet
s
A B 5 min
36
… … … … … …
Our idea: IDS on IP flows17
How well can signature-based IDSes be mimicked on IP flows?
EfficientOnly fixed-offset
predicates Flows are more
compactFlow collection
infrastructure is ubiquitous
IP flows capture the concept of a connection
Idea18
1. IDSes associate a “label” with every packet
2. An IP flow is associated with a set of packets
3. Our system associates the labels with flows
Snort rule taxonomy19
Header-only
Meta-Informatio
n
Payload dependent
Inspect only IP flow header
Inexact corresponde
nce
Inspect packet payload
e.g., port numbers
e.g., TCP flags
e.g., ”contains abc”
Relies on features that cannot be exactly reproduced in IP
flow realm
Simple translation20
3. Our systems associates the labels with flows
Simple rule translation would capture only flow predicatesLow accuracy or low applicability
• dst port = MS SQL• contains “Slammer”
20
• dst port = MS SQL
Snort rule:
Only flow predicates:
Slammer Worm
Machine Learning (ML)21
3. Our systems associates the labels with flows
Leverage ML to learn mapping from “IP flow space” to labele.g., IP flow space = src port * # packets *
flow duration:
if raised
otherwise
src port
# p
acke
ts
Boosting22
Boosting combines a set of weak learners to create a strong learner
h1
h2
h3
Hfinalsign
• dst port = MS SQL• contains “Slammer”
Benefit of Machine Learning (ML)
ML algorithms discover new predicates to capture rule Latent correlations between predicates Capturing same subspace using different dimensions
23
• dst port = MS SQL
Snort rule: Only flow predicates: ML-generated rule:
Slammer Worm
• dst port = MS SQL• packet size = 404• flow duration
Evaluation24
Border router on OC-3 link Used Snort rules in place Unsampled NetFlow v5 and packet
traces Statistics
One month, 2 MB/s average, 1 billion flows
400k Snort alarms
Accuracy metrics
Receiver Operator Characteristic (ROC) Full FP vs TP tradeoff But need a single number
Area Under Curve (AUC) Average Precision (AP)
25
AP of p1 - p
p FP per TP
25
Training on week 1, testing on week nMinimal drift within a monthHigh degree of accuracy for header and
meta
26 5 FP per 100 TP
43 FP per 100 TP
Classifier accuracy
Rule class Week1-2 Week1-3 Week1-4
Header rules 1.00 0.99 0.99
Meta-information
1.00 1.00 0.95
Payload 0.70 0.71 0.70
Variance within payload group
Accuracy is a function of correlation between flow and packet-level features
27
Rule Average Precision
MS-SQL version overflow 1.00
ICMP PING speedera 0.82
NON-RFC HTTP DELIM 0.48
Computational efficiency28
1. Machine learning (boosting) 33 hours per rule for one week of
OC48
2. Classification of flows 57k flows/sec 1.5 GHz Itanium 2 Line rate classification for OC48
Our prototype can supportOC48 (2.5 Gbps) speeds:
Chapter II: Evaluating the effectiveness of collaborative anomaly detection
Work with:• Matthew Caesar• Jennifer Rexford• Augustin Soule
29
Methodology
1. Identify attacks in IP flow traces2. Extract attackers3. Correlate attackers across victims
1) 2) 3)
30
Identifying anomalous events
Use existing anomaly detectors1
IP scans, port scans, DoS e.g., IP scan is more than
n IP addresses contacted Minimize false positives
Correlate with DNS BL IP addresses exhibiting
open proxy or spambot behavior
1[Allan IMC ’07], [Kompella IMC ’04]
31
Cooperative blocking
A set ‘S’ of victims agree to participate Beasty is blocked following initial attack
Subsequent attacks by Beasty on members of ‘S’ are deemed ineffective
CNN
FOX
Beasty is very bad!
32
DHCP lease issues
Dynamic address allocation IP address first owned by Beasty Then owned by innocent Tweety
Should not block Tweety’s innocuous queries
10.0.0.1CNN
?
33
DHCP lease issues
Dynamic address allocation IP address first owned by Beasty Then owned by innocent Tweety
Should not block Tweety’s innocuous queries
• Update DNS BL hourly
• Block IP addresses for a period shorter than most DHCP leases1
1[Xie SIGC ’07]
34
Methodology
IP flow traces from Géant
DNS BL to limit FP Cooperative blocking of
attackers for Δ hours Metric is fraction of
potentially mitigated flows
35
Blacklist duration parameter Δ
Collaboration between all hosts Majority of benefit can be had with small Δ
36
Number of participating victims
Randomly selecting n victims to collaborate in scheme Reported number average of 10 random selections
37
Number of participating victims
Collaboration between most victimized hosts Attackers are more like to continue to engage in bad action
“x” than a random other action
38
Chapter conclusion
Repeat-attacks often occur within one hour Substantially less than average DHCP lease
Collaboration can be effective Attackers contact a large number of victims 10k random hosts could mitigate 50%
Some hosts are much more likely victims Subsets of victims can see great improvement
39
Chapter III: Privacy-preserving collaborative anomaly detection
Work with:• Benny Applebaum• Matthew Caesar• Michael J Freedman• Jennifer Rexford
40
E( )
E( )
Secure Correlatio
n
Privacy-Preserving Collaboration
Haakon Ringberg
41
CNN
FOXGoogle
E( )
Protect privacy of• Participants: do not reveal who suspected whom• Suspects: only reveal suspects upon correlation
System sketch
Trusted third party is a point of failure Single rogue
employee Inadvertent data
leakage Risk of subpoena
42
Haakon Ringberg
Secure Correlatio
n
CNN FOX
Google MSFT
System sketch
Trusted third party is a point of failure Single rogue employee Inadvertent data
leakage Risk of subpoena
Fully distributed impractical Poor scalability Liveness issues
43
Haakon Ringberg
CNN FOX
Google MSFT
Managed by separate organizational entities Honest but curious proxy, DB, participants (clients) Secure as long as proxy and DB do not collude
Haakon Ringberg
44
CNN
FOX
Proxy DB
Split trustRecall:• Participant privacy• Suspect privacy
1. Clients send suspect IP addrs (x) e.g., x = 127.0.0.1
2. DB releases IPs above threshold
Protocol outline45
Client / Participa
nt
Proxy
DBx #
1 23
3 2
x
But this violates suspect privacy!
Recall:• Participant privacy• Suspect privacy
Protocol outline
1. Clients send suspect IP addrs (x)
2. DB releases IPs above threshold
46
Client / Participa
nt
Proxy
DBH(x)
#
1 23
3 2
Still violates suspect privacy!
Hash of IP address H(x)
Recall:• Participant privacy• Suspect privacy
Protocol outline
1. Clients send suspect IP addrs (x)
2. IP addrs blinded w/Fs(x) Keyed hash function (PRF) Key s held only by proxy
3. DB releases IPs above threshold
47
Fs(x)
#
Fs(1)
23
Fs(3)
2
Client / Participa
nt
Proxy
DB
Fs(x)
Still violates suspect privacy!
Keyed hash of IP address
Recall:• Participant privacy• Suspect privacy
Protocol outline
1. Clients send suspect IP addrs (x)
2. IP addrs blinded w/EDB(Fs(x)) Keyed hash function (PRF) Key s held only by proxy
3. DB releases IPs above threshold
48
Fs(x)
#
Fs(1)
23
Fs(3)
2
Client / Participa
nt
Proxy
DB
EDB(Fs(x))
But how do clients learn EDB(Fs(x))?
Encrypted keyed hash of IP address
Recall:• Participant privacy• Suspect privacy
Protocol outline
1. Clients send suspect IP addrs (x)
2. IP addrs blinded w/EDB(Fs(x)) Keyed hash function (PRF) Key s held only by proxy
3. EDB(Fs(x)) learned throughsecure function evaluation
4. DB releases IPs above threshold
49
Fs(x)
#
Fs(1)
23
Fs(3)
2
Client / Participa
nt
Proxy
DB
Fs(x)
x
s
Recall:• Participant privacy• Suspect privacy
EDB(Fs(x))
Possible to reveal IP addresses at the
end
Protocol summary
Clients send suspects IPs Learns Fs(x) using
secure function evaluation Proxy forwards to DB
Randomly shuffles suspects Re-randomizes encryptions
DB correlates using Fs(x) DB forwards bad Ips to proxy
50
Fs(x)
#
Fs(3)12
Client
EDB(Fs(3))
Fs(3)
Ds (Fs(3)) = 3
Architecture
Proxy split into client-facing and decryption oracles Proxies and DB are fully parallelizable
Clients Client-Facing Proxies
Proxy Decryption
OraclesFront-EndDB Tier
Back-EndDB
Storage
51
Evaluation
All components implemented ~5000 lines of C++ Utilizing GnuPG, BSD TCP sockets, and Pthreads
Evaluated on custom test bed ~2 GHz (single, dual, quad-core) Linux machines
52
Algorithm Parameter
Value
RSA / ElGamal key size 1024 bits
Oblivious Transfer
k 80
AES key size 256
Scalability w.r.t. # IPs53
Single CPU core for DB and proxy each
Scalability w.r.t. # clients54
Four CPU cores for DB and proxy each
Scalability w.r.t. # CPU cores
55
n CPU cores for DB and proxy each
Summary
Collaboration protocol protects privacy of Participants: do not reveal who suspected whom Suspects: only reveal suspects upon agreement
Novel composition of crypto primitives One-way function hides IPs from DB; public key
encryption allows subsequent revelation; secure function evaluation
Efficient implementation of architecture Millions of IPs in hours Scales linearly with computing resources
56
1. Speed ML-based architecture supports accurate
and scalable Snort-like classification on IP flows
2. Accuracy Collaborating against mutual adversaries
3. Privacy Novel cryptographic protocol supports
efficient collaboration in privacy-preserving manner
Conclusion57
Future Work Highlights
1. ML-based Snort-like architecture Cross-site: train on site A and test on site B Performance on sampled flow records
2. Measurement study Biased correlation results due to biased DNSBL
(ongoing) Rate at which information must be exchanged Who should cooperate: end-points or ISPs?
3. Privacy-preserving collaboration Other applications, e.g., Viacom-vs-YouTube
concerns
58
THANK YOU!
Collaborators: Jennifer Rexford, Benny Applebaum, Matthew Caesar, Nick Duffield, Michael J Freedman, Patrick Haffner, Balachander Krishnamurthy, and Augustin Soule
Accuracy is a function of correlation between flow and packet-level features
w/o dst port
w/o mean packet size
0.99 0.83
0.79 0.06
0.02 0.22
60
Rule Overall Accuracy
MS-SQL version overflow 1.00
ICMP PING speedera 0.82
NON-RFC HTTP DELIM 0.48
Difference in rule accuracy
Choosing an operating point61
X ZY
• X = alarms we want raised• Z = alarms that are raised
PrecisionY
ZExactness
RecallY
XCompleteness
AP is a single number, but not most intuitive
Precision & recall are useful for operators“I need to detect 99% of these alarms!”
Choosing an operating point62
Rule Precision w/recall 1.00
Precision w/recall=0.99
MS-SQL version overflow 1.00 1.00
ICMP PING speedera 0.02 0.83
CHAT AIM receive message 0.02 0.11
AP is a single number, but not most intuitive Precision & recall are useful for operators
“I need to detect 99% of these alarms!”
Quantifying the benefit of collaboration
MSNBC FOX CNN
Effectiveness of collaboration is a function of1. Whether different victims see the same attackers
2. Whether all victims are equally likely to be targeted
63
IP address blinding
Haakon Ringberg
64
DB requires injective and one-way function on IPs Cannot use simple hash
Fs(x) is keyed hash function (PRF) on IPs Key s held only by proxy
Client
EDB(Fs(x))
xFs(x)
Secure Function Evaluation
Haakon Ringberg
65
IP address blinding can be split into per-IP-bit xi
problem Client must learn EDB(Fs(xi)) Client must not learn s Proxy must not learn xi
Oblivious Transfer (OT) accomplishes this1,2
Amortized OT makes asymptotic performance equal to matrix multiplication3
Clientx s
EDB(Fs(x))
1[Naor et al. SODA ’01] ,1[Freedman et al. TCC ’05] ,2[Ishai et al. CRYPTO ’03]
Public key encryption
Clients encrypt suspect IPs (x) First w/proxy’s pubkey Then w/DB’s pubkey
Forwarded by proxy Does not learn IPs
Decrypted by DB Does not learn IPs
Does not allow for DB correlation due to padding (e.g., OAEP)
66
Haakon Ringberg
Client
EDB(EPX(x))
EPX(x)
How client learns Fs(x)
Client must learn Fs(x) Client must not learn ‘s’ Proxy must not learn ‘x’
Naor-Reingold PRF s = { si | 1 ≤ i ≤ 32}
PRF = g^(∏xi=1 si)
Add randomness ui to obscure si from client
Haakon Ringberg
67
Message = ui * si
How client learns Fs(x)
For each bit xi of the IP, the client learns ui * si, if xi is 1 ui, if xi is 0
The user also learns ∏ ui
Haakon Ringberg
68
x0=0 x1=1 x31=1 x =
u0 u1 * s1 u31 * s31Fs(x) =
s0 s1 s31 s =
How client learns Fs(x)
User multiplies together all values Divides out ∏ ui
Acquires Fs(x) w/o having learned ‘s’
Haakon Ringberg
69
∏ ui * ∏xi=1 si ∏xi=1 ui * si * ∏xi=0 ui
∏ ui * ∏xi=1 si / ∏ ui
∏ ui
Fs(x) = ∏xi=1 si
How client learns Fs(x)
User multiplies together all values Divides out ∏ ui
Acquires Fs(x) w/o having learned ‘s’
Haakon Ringberg
70
70
• But how does the client learn• si * ui, if xi is 1
• ui, if xi is 0• Without the proxy learning the IP x?
Oblivious Transfer (details)
1. Client sends f(x=0) and f(x=1) Proxy doesn’t learn x
2. Proxy sends v(0) = Eg(f(0))(1 + r) v(1) = Eg(f(1))(s + r)
3. Client decrypts v(x) with g(f(x)) Calculates g(f(x)) Cannot calculate g(f(1-x))
71
Haakon Ringberg
Client
• x• g(f(x))
s
Public:• f(x)• g(x)
f(0)f(1)
v(0)v(1)
Oblivious Transfer (more details)
Haakon Ringberg
72
Proxy chooses random c and r (at startup) Proxy publishes c and gr
Client chooses random k (for each bit)
Preprocessing:
1. Keyx = gk
Key1-x = c * g-k
2. Keyxr = (gr)k
Used to decrypt yx
1. Key0r = Key0
r
Key1r = cr / Key0
r
2. y0 = AESKey1r (u)
y1 = AESKey0r (s * u)
Key0
y0
y1
Oblivious Transfer (more details)
Haakon Ringberg
73
1. Keyx = gk
Key1-x = c * g-k
2. Keyxr = (gr)k
Used to decrypt yx
1. Key0r = Key0
r
Key1r = cr / Key0
r
2. y0 = AESKey1r (u)
y1 = AESKey0r (s * u)
Key0
y0
y1
• Proxy never learns x
• Client can calculate Keyxr = (gr)k easily,
but cannot calculate cr (due to lack of r), which is needed for Key1-x
r = cr * (gr)-k
Other usage scenarios
1. Cross-checking certificates e.g., Perspectives1
Clients = end users Keys = Hash of certificates received
2. Distributed ranking e.g., Alexa Toolbar2
Clients = Web users Keys = Hash of web pages
74
1[Wendlandt USENIX ’08],2[www.alexa.com]