blindbox: deep packet inspection over encrypted trafficjustine/sigcomm_2015_bb.pdf · blindbox:...
TRANSCRIPT
BlindBox: Deep Packet Inspection over Encrypted Traffic
Justine Sherry, Chang Lan, Raluca Ada Popa, Sylvia Ratnasamy
Deep Packet Inspection
Many middleboxes inspect packet payloads by reconstructing TCP bytestream.
Intrusion Prevention
Parental Filtering
Exfiltration Detection
Example: Intrusion Prevention
BobAlice IPS
“Rule Generators”
ATTACK
HACKS
183237
RULES
ATTACK
Rule = description of an attack.
Challenge for DPI: Increasing use of Encryption
BobAlice
Intrusion Prevention
IPS
ATTACK
HACKS
183237
RULES
Bob does not get the benefits of Intrusion Prevention!
with HTTPS
ATTACK0xf34…
????
I want privacy!
Bob
State of the Art Solution? Man in the Middle SSL
IPS
Bob has no privacy from middlebox!
I am Googlefake certificate
SECRET
Bob: Privacy + Functionality
❖ Bob has two very conflicting requirements.
❖ Privacy
❖ In-network functionality
❖ Can he have his cake and eat it too?
Yes!
BlindBox
The first system to allow DPI middle boxes to inspect traffic without decrypting the traffic.
Detects blacklisted keywords in encrypted connections.
Learns virtually nothing about connections that do not contain blacklisted keywords.
BlindBox
Strong Privacy GuaranteeProof of security: only learns if and where suspicious
substrings match.
Practical Network PerformanceAchieves forwarding rates comparable to standard IDS
deployments.
Wide Range of FunctionalitySupports exact-match detection as well as regular
expressions and scripted analysis.
Setup
“Rule Generator”
Alice Bob
Threat Model
“Rule Generator”
Generates rules correctly
Runs detection functionality correctly
Aims to protect one honest endpoint from
another dishonest endpoint. Does not
address two dishonest endpoints.
Users should not learn McAffee’s ruleset.
but curious to see traffic contents (“honest but
curious model”)Alice Bob
BlindBox HTTPS: Overview
Alice’s Protocol Stack
Message
SSL Encrypt SSL Decrypt
BlindBox Encrypt
SSL Traffic
ATTACK
HACKS
183237
BLACKLIST
BB Handshake
Encrypted Tokens
HACKS
BlindBox Verify
Bob’s Protocol Stack
One Slide Introduction to BB HandshakeAn exchange between the middlebox and the client at
the beginning of a new connection.
Middlebox learns AESK(kw) for every
keyword in ruleset, iff keyword has a signature
from rule generator.
Clients know secret key “K”
SECRETHACKS
BLACKLIST Middlebox and Rule generator know rules.
Middlebox does not learn K.
Clients do not learn rules.
Based on two known techniques: Garbled
Circuits and Oblivious Transfer.
AES encryptionkeyword
AESK(kw)
Alice
BlindBox Encrypt: A Deterministic Strawman
ATTACK
HACKS
183237
BLACKLISTAESK(ATTACK)
EXAMPLE ATTACK
BobEXAMPL
AESK(EXAMPL)
XAMPLEAMPLE …ATTACK
Alice
A problem with the strawman design
ATTACK
HACKS
183237
BLACKLIST
I LOVE MUFFINS. WHAT IS YOUR FAVORITE MUFFIN?
MUFFIN
MUFFIN
This term appears twice in the connection!
Deterministic Encryption leaks substring frequency.
Bob
Searchable Encryption Approaches
Approach Security Detection Speed
Deterministic Searchable Encryption
Randomized Searchable Enc. [Song et. al ’00]
Strong
Weak
Slow: O(#rules)
Fast: O(log(#rules))
Approach
Deterministic Searchable Encryption
Randomized Searchable Enc. [Song et. al ’00]
Security
Strong
Weak
Detection Speed
Slow: O(#rules)
Fast: O(log(#rules))
Searchable Encryption Approaches
Approach
Deterministic Searchable Encryption
Randomized Searchable Enc. [Song et. al ’00]
Detection Speed
Slow: O(#rules)
Fast: O(log(#rules))
Store rules in a precomputed search tree, search for each
token in tree.
Compare between each randomized token and EVERY
keyword.
token =? f(rule1,salt)
token =? f(rule2,salt)
token =? f(rule3,salt)…
BlindBox: speed of deterministic enc, security of randomization
Searchable Encryption Approaches
Alice
BlindBox Encrypt: At the Client
I LOVE MUFFINS. WHAT IS YOUR FAVORITE MUFFIN?
Tokens: {I LOVE, LOVE , LOVE M, OVE MU, VE MUF,
E MUFF, MUFFI, MUFFIN…}
enc(token)
token
At session start, Alice pre-computes an
initial salt value salt0.
AESK(token) For each term, Alice encrypts token with
secret key k.
AES_(salt0)Alice encrypts salt
with AESK(token) as key.
BlindBox Encrypt: At the Client
I LOVE MUFFINS. WHAT IS YOUR FAVORITE MUFFIN?
AES (I LOVE)K
AES (salt0)
Alice I LOVE
BlindBox Encrypt: At the Client
I LOVE MUFFINS. WHAT IS YOUR FAVORITE MUFFIN?
AES ( LOVE )K
AES (salt0)
Alice LOVE
Use same salt for all tokens.
BlindBox Encrypt: At the Client
I LOVE MUFFINS. WHAT IS YOUR FAVORITE MUFFIN?
AES (I LOVE)K
AES (salt 1)AES (MUFFIN)
K
AES (salt0)
Alice MUFFIN
BlindBox Encrypt: At the Client
I LOVE MUFFINS. WHAT IS YOUR FAVORITE MUFFIN?
AES (I LOVE)K
AES (salt 1)AES (MUFFIN)
K
AES (salt1)
Set salt1 = salt0 + 1
Same encryption value never appears twice!
Alice MUFFIN
AES (salt0)
BlindBox Encrypt: Middlebox Setup
After Handshake, Clients sends
middlebox (salt0)
BLACKLIST
18724
HACKS
ATTACK
root
0xae2…
AES (salt0)
BlindBox Encrypt: Middlebox Setup
After Handshake, Clients sends
middlebox (salt0)
BLACKLIST
18724
HACKS
root
0x1af…
ATTA
Store AESk(keyword) and salt in tree.
0xe13…
AES (salt0)
BlindBox Encrypt: Middlebox Setup
After Handshake, Clients sends
middlebox (salt0)
BLACKLIST
18724
root
0x1af…
ATTA
0xe13…
ATTA
BlindBox Encrypt: Middlebox Setup
BLACKLIST
root
0x1af…
ATTA
0xe47…
ATTA
0x1ag…
ATTA
0x1af…
ATTA
0xe13…
ATTA
BlindBox can now detect matches in O(log(#rules))
BlindBox Encrypt: Middlebox Setup
BLACKLIST
root
0x1af…
ATTA
0xe47…
ATTA
0x1ag…
ATTA
0x1af…
ATTA
0xe13…
ATTA
On match:* Respond to match
* Average rule requires three matches to detect attack.
* Then update tree.0x1af…
BlindBox Encrypt: Middlebox Setup
BLACKLIST
root
0x1ae…
ATTA
0xe47…
ATTA
0x1ag…
ATTA
0x1af…
ATTA
0xe13…
ATTA
To Update:
AES ( + 1)
HACKS
0xe46…
BlindBox Encrypt: Middlebox Setup
BLACKLIST
root
0x1ae…
ATTA
0xe47…
ATTA
0x1ag…
ATTA
0xe13…
ATTA
To Update:
AES ( + 1)
HACKS
0xe46…
ATTA
Making BlindBox Encrypt Secure and Fast
Approach
Deterministic Searchable Encryption
Randomized Searchable Enc. [Song et. al ’00]
Security
Strong
Weak
Detection Speed
Slow: O(#rules)
Fast: O(log(#rules))
BlindBox Strong Fast: O(log(#rules))
BlindBox HTTPS: Recap
BobAlice
Message
SSL Encrypt SSL Decrypt
BlindBox Encrypt
SSL Traffic
ATTACK
HACKS
183237
BLACKLIST
BB Handshake
Encrypted TokensBlindBox Verify
MB receives encrypted rules
Tokens are random, but MB can still do fast, exact-match lookups.
HACKS
Supporting Regular ExpressionsToday: discussed exact match detection.
In paper: how to handle regular expressions and scripts.
Key Idea: “Probable Cause Privacy”
ATTACK COORDINATES ARE (37.4225, 122.1653)
ATTACK
HACKS
183237
BLACKLIST
A weaker privacy model which allows scripted analysis.
Alice
See our paper for:
Optimizations to reduce bandwidth overhead.
Details on BB Handshake, Garbled Circuits, and Oblivious Transfer.
Detailed evaluation and comparison against alternative crypto schemes.
“Exact Match” vs “Probable Cause” Privacy Models.
See our paper or come chat with me after the session!
Evaluation Highlights: Functionality & Performance
Evaluating FunctionalityDataset Without probable
causeWith probable cause
Document watermarking 100% 100%
Parental filtering 100% 100%
Snort community (HTTP)
67% 100%
Snort Emerging Threats (HTTP) 42% 100%
StoneSoft (McAffee) IDS 40% 100%
LastLine IDS 29% 100%
Performance Highlights
Forwarding Rate
186Mbps
Comparable to Snort in existing IDS deployments.
Setup Time
97s for LastLine
Reasonable for long-lived/persistent connections ONLY
Page Load Times
+15-100%
Within normal variation depending
on conn. quality.
3 orders of magnitude faster than Searchable Enc. 10 orders of magnitude faster than Functional Encryption.
Conclusion
BlindBox: the first system to allow DPI middleboxes to inspect traffic without
decrypting the traffic.
Future work: Can we generalize BlindBox to a protocol to support all middleboxes without
sacrificing privacy?
contact: [email protected] | @justinesherry
In Comparison: mcTLS and BlindBox
mcTLS:
Allows MB to read suspicious keywords
only from entire bytestream.
Allows MB to read arbitrary values from fields client chooses
to reveal.
BlindBox:
THIS IS AN EXAMPLE ATTACK MESSAGE!
THIS IS AN EXAMPLE ATTACK MESSAGE!
THIS IS AN EXAMPLE INNOCENT MESSAGE!
THIS IS AN EXAMPLE ATTACK MESSAGE!
Download Times
0 1 2 3 4 5 6 7 8 9
CNN NYTimes YouTube AirBnB Gutenberg
Pag
e L
oad
Tim
e (s
)
Whole Page: BB+TLSWhole Page: TLS
Text/Code: BB+TLSText/Code: TLS
0 2 4 6 8
10 12 14 16
YouTube AirBnB CNN NYTimes Gutenberg
Pag
e L
oad
Tim
e (s
)
Whole Page: BB+TLSWhole Page: TLS
Text/Code: BB+TLSText/Code: TLSHome Networks:
Datacenter Network:
Bandwidth Inflation
0
0.2
0.4
0.6
0.8
1
0 5 10 15 20
CD
F
Tokenization Overhead Ratio
Delim Tokenization : PlaintextWindow Tokenization : Plaintext
Delim Tokenization : gzipWindow Tokenization : gzip
2.5x2.7x
14x
Can’t functional encryptions solve this?• Existing schemes don’t fit our needs:
• Wrong security model: all parties learn all of the middlebox rules
• Missing functionality: no approach to address rules which are regular expressions
• Prohibitive performance: Performing IDS detection over a single packet requires over 1 day of computation on our servers!*
*J. Katz, A. Sahai, B. Waters. “Predicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products.” EUROCRYPT 2008.
Microbenchmarks
Vanilla HTTPS FE Strawman Searchable Strawman BlindBox HTTPS
ClientEncrypt (128 bits) 13ns 70ms 2.7µs 69nsEncrypt (1500 bytes) 3µs 15s 257µs 90µsSetup (1 Keyword) 73ms N/A N/A 588 msSetup (3K Rules) 73ms N/A N/A 97 s
MB
Detection:1 Rule, 1 Token NP 170ms 1.9µs 20ns1 Rule, 1 Packet NP 36s 52µs 5µs3K Rules, 1 Token NP 8.3 minutes 5.6ms 137ns3K Rules, 1 Packet NP 5.7 days 157ms 33µs
Table 2: Connection and detection micro-benchmarks comparing Vanilla HTTPS, the functional encryption (FE)strawman, the searchable strawman, and BlindBox HTTPS. NP stands for not possible. The average rule includesthree keywords.
0 1 2 3 4 5 6 7 8 9
CNN NYTimes YouTube AirBnB Gutenberg
Pag
e L
oad
Tim
e (s
)
Whole Page: BB+TLSWhole Page: TLS
Text/Code: BB+TLSText/Code: TLS
Figure 4: Download time for TLS and BlindBox (BB) +TLS at 1Gbps⇥10ms.
How long does the initial handshake take with the middle-box? The initial handshake to perform obfuscated rule en-cryption runs in time proportional to the number of rules. Inthe datasets we worked with, the average Protocol II rule hadslightly more than 3 keywords; a typical 3000 rule IDS ruleset contains between 9-10k keywords. The total client-sidetime required for 10k keywords was 97 seconds; for 1000keywords, setup time was 9.5s. In a smaller ruleset of 10or 100 keywords (which is typical in a watermark detectionexfiltration device), setup ran in 650ms and 1.6 seconds, re-spectively. These values are dependent on the clock speed ofthe CPU (to generate the garbled circuits) and the networkbandwidth and latency (to transmit the circuits from client tosender). Our servers have 2.6GHz cores; we assumed a mid-dlebox on a local area network near the client with a 100µsRTT between the two and a 1Gbps connection. Garbling acircuit took 1042µs per circuit; each garbled circuit trans-mission is 599KB.
Neither strawman has an appropriate setup phase that meetsthe requirement of not making the rules visible to the end-points. However, one can extend these strawmen with Blind-Box’s obfuscated rule encryption technique, and encrypt therules using garbled circuits. In this case, for the scheme ofSong et al., the setup cost would be similar to the one ofBlindBox because their scheme also encrypts the rule key-words with AES. For the scheme of Katz et al., the setupwould be much slower because one needs garbled circuitsfor modular exponentiation, which are huge. Based on thesize of such circuits reported in the literature [16], we cancompute a generous lower bound on the size of the garbledcircuits and on the setup cost for this strawman: it is at least1.8 · 103 times larger/slower than the setup in BlindBox.
How long are page downloads with BlindBox, excluding thesetup (handshake) cost? Figure 3 shows page downloadtimes using our “typical end user" testbed with 20Mbps links.
In this figure, we show five popular websites: YouTube,AirBnB, CNN, The New York Times, and Project Guten-berg. The data shown represents the post-handshake (persis-tent connection) page download time. YouTube and AirBnBload video, and hence have a large amount of binary datawhich is not tokenized. CNN and The New York Timeshave a mixture of data, and Project Gutenberg is almost en-tirely text. We show results for both the amount of time todownload the page including all video and image content, aswell as the amount of time to load only the Text/Code of thepage. The overheads when downloading the whole page areat most 2⇥; for pages with large amount of binary data likeYouTube and AirBnB, the overhead was only 10-13%. Loadtimes for Text/Code only – which are required to actuallybegin rendering the page for the user – are impacted morestrongly, with penalties as high as 3⇥ and a worst case ofabout 2⇥.What is the computational overhead of BlindBox encryption,and how does this overhead impact page load times? Whilethe encryption costs are not noticeable in the page downloadtimes observed over the “typical client” network configura-tion, we immediately see the cost of encryption overheadwhen the available link capacity increases to 1Gbps in Fig-ure 4 – at this point, we see a performance overhead of asmuch as 16⇥ relative to the baseline SSL download time.For both runs (Figs. 3 and 4), we observed that the CPUwas almost continuously fully utilized to transfer data dur-ing data transmission. At 20Mbps, the encryption cost isnot noticeable as the CPU can continue producing data ataround the link rate; at 1Gbps, transmission with BlindBoxstalls relative to SSL, as the BlindBox sender cannot encryptfast enough to keep up with the line rate. This result is unsur-prising given the results in Table 2, showing that BlindBoxtakes 30⇥ longer to encrypt a packet than standard HTTPS.This overhead can be mitigated with extra cores; while weran with only one core per connection, tokenization can eas-ily be parallelized.What is the bandwidth overhead of transmitting encryptedtokens for a typical web page? Minimizing bandwidth over-head is key to client performance: less data transmitted meansless cost, faster transfer times, and faster detection times.The bandwidth overhead in BlindBox depends on the num-ber of tokens produced. The number of encrypted tokensvaries widely depending on three parameters of the page be-ing loaded: what fraction of bytes are text/code which must