blindbox: deep packet inspection over encrypted trafficjustine/sigcomm_2015_bb.pdf · blindbox:...

BlindBox: Deep Packet Inspection over Encrypted Traffic

Justine Sherry, Chang Lan, Raluca Ada Popa, Sylvia Ratnasamy

Deep Packet Inspection

Many middleboxes inspect packet payloads by reconstructing TCP bytestream.

Intrusion Prevention

Parental Filtering

Exfiltration Detection

Example: Intrusion Prevention

BobAlice IPS

“Rule Generators”

ATTACK

HACKS

183237

RULES

ATTACK

Rule = description of an attack.

Challenge for DPI: Increasing use of Encryption

BobAlice

Intrusion Prevention

IPS

ATTACK

HACKS

183237

RULES

Bob does not get the benefits of Intrusion Prevention!

with HTTPS

ATTACK0xf34…

????

I want privacy!

Bob

State of the Art Solution? Man in the Middle SSL

IPS

Bob has no privacy from middlebox!

I am Googlefake certificate

SECRET

Bob: Privacy + Functionality

❖ Bob has two very conflicting requirements.

❖ Privacy

❖ In-network functionality

❖ Can he have his cake and eat it too?

BlindBox

The first system to allow DPI middle boxes to inspect traffic without decrypting the traffic.

Detects blacklisted keywords in encrypted connections.

Learns virtually nothing about connections that do not contain blacklisted keywords.

BlindBox

Strong Privacy GuaranteeProof of security: only learns if and where suspicious

substrings match.

Practical Network PerformanceAchieves forwarding rates comparable to standard IDS

deployments.

Wide Range of FunctionalitySupports exact-match detection as well as regular

expressions and scripted analysis.

Setup

“Rule Generator”

Alice Bob

Threat Model

“Rule Generator”

Generates rules correctly

Runs detection functionality correctly

Aims to protect one honest endpoint from

another dishonest endpoint. Does not

address two dishonest endpoints.

Users should not learn McAffee’s ruleset.

but curious to see traffic contents (“honest but

curious model”)Alice Bob

BlindBox HTTPS: Overview

Alice’s Protocol Stack

Message

SSL Encrypt SSL Decrypt

BlindBox Encrypt

SSL Traffic

ATTACK

HACKS

183237

BLACKLIST

BB Handshake

Encrypted Tokens

HACKS

BlindBox Verify

Bob’s Protocol Stack

One Slide Introduction to BB HandshakeAn exchange between the middlebox and the client at

the beginning of a new connection.

Middlebox learns AESK(kw) for every

keyword in ruleset, iff keyword has a signature

from rule generator.

Clients know secret key “K”

SECRETHACKS

BLACKLIST Middlebox and Rule generator know rules.

Middlebox does not learn K.

Clients do not learn rules.

Based on two known techniques: Garbled

Circuits and Oblivious Transfer.

AES encryptionkeyword

AESK(kw)

Alice

BlindBox Encrypt: A Deterministic Strawman

ATTACK

HACKS

183237

BLACKLISTAESK(ATTACK)

EXAMPLE ATTACK

BobEXAMPL

AESK(EXAMPL)

XAMPLEAMPLE …ATTACK

Alice

A problem with the strawman design

ATTACK

HACKS

183237

BLACKLIST

I LOVE MUFFINS. WHAT IS YOUR FAVORITE MUFFIN?

MUFFIN

MUFFIN

This term appears twice in the connection!

Deterministic Encryption leaks substring frequency.

Bob

Searchable Encryption Approaches

Approach Security Detection Speed

Deterministic Searchable Encryption

Randomized Searchable Enc. [Song et. al ’00]

Strong

Weak

Slow: O(#rules)

Fast: O(log(#rules))

Approach



Security

Strong

Weak

Detection Speed

Slow: O(#rules)



Approach



Detection Speed

Slow: O(#rules)


Store rules in a precomputed search tree, search for each

token in tree.

Compare between each randomized token and EVERY

keyword.

token =? f(rule1,salt)

token =? f(rule2,salt)

token =? f(rule3,salt)…

BlindBox: speed of deterministic enc, security of randomization


Alice

BlindBox Encrypt: At the Client


Tokens: {I LOVE, LOVE , LOVE M, OVE MU, VE MUF,

E MUFF, MUFFI, MUFFIN…}

enc(token)

token

At session start, Alice pre-computes an

initial salt value salt0.

AESK(token) For each term, Alice encrypts token with

secret key k.

AES_(salt0)Alice encrypts salt

with AESK(token) as key.



AES (I LOVE)K

AES (salt0)

Alice I LOVE



AES ( LOVE )K

AES (salt0)

Alice LOVE

Use same salt for all tokens.



AES (I LOVE)K

AES (salt 1)AES (MUFFIN)

K

AES (salt0)

Alice MUFFIN



AES (I LOVE)K

AES (salt 1)AES (MUFFIN)

K

AES (salt1)

Set salt1 = salt0 + 1

Same encryption value never appears twice!

Alice MUFFIN

AES (salt0)

BlindBox Encrypt: Middlebox Setup

After Handshake, Clients sends

middlebox (salt0)

BLACKLIST

18724

HACKS

ATTACK

root

0xae2…

AES (salt0)



middlebox (salt0)

BLACKLIST

18724

HACKS

root

0x1af…

ATTA

Store AESk(keyword) and salt in tree.

0xe13…

AES (salt0)



middlebox (salt0)

BLACKLIST

18724

root

0x1af…

ATTA

0xe13…

ATTA


BLACKLIST

root

0x1af…

ATTA

0xe47…

ATTA

0x1ag…

ATTA

0x1af…

ATTA

0xe13…

ATTA

BlindBox can now detect matches in O(log(#rules))


BLACKLIST

root

0x1af…

ATTA

0xe47…

ATTA

0x1ag…

ATTA

0x1af…

ATTA

0xe13…

ATTA

On match:* Respond to match

* Average rule requires three matches to detect attack.

* Then update tree.0x1af…


BLACKLIST

root

0x1ae…

ATTA

0xe47…

ATTA

0x1ag…

ATTA

0x1af…

ATTA

0xe13…

ATTA

To Update:

AES ( + 1)

HACKS

0xe46…


BLACKLIST

root

0x1ae…

ATTA

0xe47…

ATTA

0x1ag…

ATTA

0xe13…

ATTA

To Update:

AES ( + 1)

HACKS

0xe46…

ATTA

Making BlindBox Encrypt Secure and Fast

Approach



Security

Strong

Weak

Detection Speed

Slow: O(#rules)


BlindBox Strong Fast: O(log(#rules))

BlindBox HTTPS: Recap

BobAlice

Message

SSL Encrypt SSL Decrypt

BlindBox Encrypt

SSL Traffic

ATTACK

HACKS

183237

BLACKLIST

BB Handshake

Encrypted TokensBlindBox Verify

MB receives encrypted rules

Tokens are random, but MB can still do fast, exact-match lookups.

HACKS

Supporting Regular ExpressionsToday: discussed exact match detection.

In paper: how to handle regular expressions and scripts.

Key Idea: “Probable Cause Privacy”

ATTACK COORDINATES ARE (37.4225, 122.1653)

ATTACK

HACKS

183237

BLACKLIST

A weaker privacy model which allows scripted analysis.

Alice

See our paper for:

Optimizations to reduce bandwidth overhead.

Details on BB Handshake, Garbled Circuits, and Oblivious Transfer.

Detailed evaluation and comparison against alternative crypto schemes.

“Exact Match” vs “Probable Cause” Privacy Models.

See our paper or come chat with me after the session!

Evaluation Highlights: Functionality & Performance

Evaluating FunctionalityDataset Without probable

causeWith probable cause

Document watermarking 100% 100%

Parental filtering 100% 100%

Snort community (HTTP)

67% 100%

Snort Emerging Threats (HTTP) 42% 100%

StoneSoft (McAffee) IDS 40% 100%

LastLine IDS 29% 100%

Performance Highlights

Forwarding Rate

186Mbps

Comparable to Snort in existing IDS deployments.

Setup Time

97s for LastLine

Reasonable for long-lived/persistent connections ONLY

Page Load Times

+15-100%

Within normal variation depending

on conn. quality.

3 orders of magnitude faster than Searchable Enc. 10 orders of magnitude faster than Functional Encryption.

Conclusion

BlindBox: the first system to allow DPI middleboxes to inspect traffic without

decrypting the traffic.

Future work: Can we generalize BlindBox to a protocol to support all middleboxes without

sacrificing privacy?

contact: [email protected] | @justinesherry

mailto:[email protected]

In Comparison: mcTLS and BlindBox

mcTLS:

Allows MB to read suspicious keywords

only from entire bytestream.

Allows MB to read arbitrary values from fields client chooses

to reveal.

BlindBox:

THIS IS AN EXAMPLE ATTACK MESSAGE!


THIS IS AN EXAMPLE INNOCENT MESSAGE!


Download Times

0 1 2 3 4 5 6 7 8 9

CNN NYTimes YouTube AirBnB Gutenberg

Pag

e L

oad

Tim

e (s

)

Whole Page: BB+TLSWhole Page: TLS

Text/Code: BB+TLSText/Code: TLS

0 2 4 6 8

10 12 14 16

YouTube AirBnB CNN NYTimes Gutenberg

Pag

e L

oad

Tim

e (s

)


Text/Code: BB+TLSText/Code: TLSHome Networks:

Datacenter Network:

Bandwidth Inflation

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20

CD

F

Tokenization Overhead Ratio

Delim Tokenization : PlaintextWindow Tokenization : Plaintext

Delim Tokenization : gzipWindow Tokenization : gzip

2.5x2.7x

14x

Can’t functional encryptions solve this?• Existing schemes don’t fit our needs:

• Wrong security model: all parties learn all of the middlebox rules

• Missing functionality: no approach to address rules which are regular expressions

• Prohibitive performance: Performing IDS detection over a single packet requires over 1 day of computation on our servers!*

*J. Katz, A. Sahai, B. Waters. “Predicate Encryption Supporting Disjunctions, Polynomial Equations, and Inner Products.” EUROCRYPT 2008.

Microbenchmarks

Vanilla HTTPS FE Strawman Searchable Strawman BlindBox HTTPS

ClientEncrypt (128 bits) 13ns 70ms 2.7µs 69nsEncrypt (1500 bytes) 3µs 15s 257µs 90µsSetup (1 Keyword) 73ms N/A N/A 588 msSetup (3K Rules) 73ms N/A N/A 97 s

MB

Detection:1 Rule, 1 Token NP 170ms 1.9µs 20ns1 Rule, 1 Packet NP 36s 52µs 5µs3K Rules, 1 Token NP 8.3 minutes 5.6ms 137ns3K Rules, 1 Packet NP 5.7 days 157ms 33µs

Table 2: Connection and detection micro-benchmarks comparing Vanilla HTTPS, the functional encryption (FE)strawman, the searchable strawman, and BlindBox HTTPS. NP stands for not possible. The average rule includesthree keywords.

0 1 2 3 4 5 6 7 8 9

CNN NYTimes YouTube AirBnB Gutenberg

Pag

e L

oad

Tim

e (s

)


Text/Code: BB+TLSText/Code: TLS

Figure 4: Download time for TLS and BlindBox (BB) +TLS at 1Gbps⇥10ms.

How long does the initial handshake take with the middle-box? The initial handshake to perform obfuscated rule en-cryption runs in time proportional to the number of rules. Inthe datasets we worked with, the average Protocol II rule hadslightly more than 3 keywords; a typical 3000 rule IDS ruleset contains between 9-10k keywords. The total client-sidetime required for 10k keywords was 97 seconds; for 1000keywords, setup time was 9.5s. In a smaller ruleset of 10or 100 keywords (which is typical in a watermark detectionexfiltration device), setup ran in 650ms and 1.6 seconds, re-spectively. These values are dependent on the clock speed ofthe CPU (to generate the garbled circuits) and the networkbandwidth and latency (to transmit the circuits from client tosender). Our servers have 2.6GHz cores; we assumed a mid-dlebox on a local area network near the client with a 100µsRTT between the two and a 1Gbps connection. Garbling acircuit took 1042µs per circuit; each garbled circuit trans-mission is 599KB.

Neither strawman has an appropriate setup phase that meetsthe requirement of not making the rules visible to the end-points. However, one can extend these strawmen with Blind-Box’s obfuscated rule encryption technique, and encrypt therules using garbled circuits. In this case, for the scheme ofSong et al., the setup cost would be similar to the one ofBlindBox because their scheme also encrypts the rule key-words with AES. For the scheme of Katz et al., the setupwould be much slower because one needs garbled circuitsfor modular exponentiation, which are huge. Based on thesize of such circuits reported in the literature [16], we cancompute a generous lower bound on the size of the garbledcircuits and on the setup cost for this strawman: it is at least1.8 · 103 times larger/slower than the setup in BlindBox.

How long are page downloads with BlindBox, excluding thesetup (handshake) cost? Figure 3 shows page downloadtimes using our “typical end user" testbed with 20Mbps links.

In this figure, we show five popular websites: YouTube,AirBnB, CNN, The New York Times, and Project Guten-berg. The data shown represents the post-handshake (persis-tent connection) page download time. YouTube and AirBnBload video, and hence have a large amount of binary datawhich is not tokenized. CNN and The New York Timeshave a mixture of data, and Project Gutenberg is almost en-tirely text. We show results for both the amount of time todownload the page including all video and image content, aswell as the amount of time to load only the Text/Code of thepage. The overheads when downloading the whole page areat most 2⇥; for pages with large amount of binary data likeYouTube and AirBnB, the overhead was only 10-13%. Loadtimes for Text/Code only – which are required to actuallybegin rendering the page for the user – are impacted morestrongly, with penalties as high as 3⇥ and a worst case ofabout 2⇥.What is the computational overhead of BlindBox encryption,and how does this overhead impact page load times? Whilethe encryption costs are not noticeable in the page downloadtimes observed over the “typical client” network configura-tion, we immediately see the cost of encryption overheadwhen the available link capacity increases to 1Gbps in Fig-ure 4 – at this point, we see a performance overhead of asmuch as 16⇥ relative to the baseline SSL download time.For both runs (Figs. 3 and 4), we observed that the CPUwas almost continuously fully utilized to transfer data dur-ing data transmission. At 20Mbps, the encryption cost isnot noticeable as the CPU can continue producing data ataround the link rate; at 1Gbps, transmission with BlindBoxstalls relative to SSL, as the BlindBox sender cannot encryptfast enough to keep up with the line rate. This result is unsur-prising given the results in Table 2, showing that BlindBoxtakes 30⇥ longer to encrypt a packet than standard HTTPS.This overhead can be mitigated with extra cores; while weran with only one core per connection, tokenization can eas-ily be parallelized.What is the bandwidth overhead of transmitting encryptedtokens for a typical web page? Minimizing bandwidth over-head is key to client performance: less data transmitted meansless cost, faster transfer times, and faster detection times.The bandwidth overhead in BlindBox depends on the num-ber of tokens produced. The number of encrypted tokensvaries widely depending on three parameters of the page be-ing loaded: what fraction of bytes are text/code which must

blindbox: deep packet inspection over encrypted trafficjustine/sigcomm_2015_bb.pdf · blindbox:...

Documents