brown university - cra · -- edward snowden @ sxsw‘14. encrypted search soluons 13. usage 14 tk...
TRANSCRIPT
Encrypted Search
Seny Kamara
Brown University
2
3
4
Q: Why is this happening?
5
Big Data ► Industry and Governments want more data ► NaDonal security ► Machine learning ► Business analyDcs ► NLP ► LocaDon-based services ► …
6
Big Data
7
u More intrusive & sensitive u Photos, medical records
u Location data, email,
u browsing history, voicemails
u Greater need for security
u Harder to secure u NSA Bluffdale holds 2EBs! (2K PBs)
u Facebook holds 300PBs of photos/videos
u Vs. nation states, intelligence agencies, organized crime, insiders, …
Big Data
8
u Impossible to work with u Lose search, DBs, IR
u Find your photo among 300PBs?
u Rank results?
u End-to-end (e2e) encryption! u Reduces attack surface
u Secure small key instead of Big Data
Q: Can we search on encrypted data?
9
An InteresDng QuesDon
10
Cryptography
Data Structures
InformaDon Retrieval
Graph Theory
Databases
Combinatorial OpDmizaDon
StaDsDcs
A LucraDve QuesDon
11
► Startups ► CipherCloud ($30M+$50M) ► Navajo (Salesforce) ► SkyHigh , Vaultive, Inpher ► Bitglass, Private Machines, …
► Major Corporations ► Microsoft, IBM, ► Google, Yahoo ► Hitachi, Fujitsu
► Funding agencies ► IARPA ► DARPA ► NSF
12
“There are a lot of advancements in things like encrypted search...but in general it is a difficult problem”
-- Edward Snowden @ SXSW‘14
Encrypted Search SoluDons
13
Usage
14
tk
EDB
DB
Desiderata
15
tk
EDB
Storage leakage
Query leakage
Size of EDB
Search Dme
Size of tk
Many Approaches ► Stream ciphers [SWP01]
► BuckeDng [HILM02]
► Structured and searchable encrypDon (StE/SSE) [SWP01,CGKO06,CK10]
► Oblivious RAM (ORAM) [GO96]
► FuncDonal encrypDon (e.g., PEKS) [BCOP06]
► MulD-party computaDon (MPC) [Yao82,GMW87]
► Property-preserving encrypDon (PPE) [AKSX04,BBO06,BCLO09]
► Fully-homomorphic encrypDon [G09]
16
Tradeoffs: Efficiency vs. Security
17
Efficiency STE/SSE-based
PPE-based
FHE-based
ORAM-based
skFE-based pkFE-based
Leakage
Tradeoffs: FuncDonality vs. Efficiency
18
SK-FE-based STE/SSE-based
PPE-based FHE-based
ORAM-based
PK-FE-based
Efficiency
FuncDonality
SQL
NoSQL
Leakage
19
► TheoreDcal Cryptography [Goldwasser-Micali82,…] ► A great success story ► Helps us reason about confidenDality, integrity, … ► Focused on leakage-free cryptography
► Real-world systems security relies on tradeoffs ► No cryptographic foundaDons for tradeoffs ► Can we leak X but not Y? ► How do we model leakage?
Leakage [Curtmola-Garay-K.-Ostrovsky06, Chase-K.10, Islam-Kuzu-Kantarcioglu12, K.15]
► Leakage analysis: what is being leaked?
► Proof: prove that soluDon leaks no more
► Cryptanalysis: can we exploit the leakage?
20
Leakage analysis Proof of security Leakage cryptanalysis
ApplicaDons
21
Encrypted Search Engines ► Desktop search ► Windows search, Apple Spotlight
► Personal cloud storage ► Dropbox, OneDrive, iCloud, …
► Webmail ► Gmail, Yahoo! Mail, Outlook.com,…
22
Encrypted DBs ► Standard DBs ► DB encrypted in memory
► Cloud DBs ► DB encrypted in cloud
23
Encrypted NSA Metadata Program [K.14]
► To & from numbers, Dme of call, duraDon for all US-to-US, US-to-Foreign and Foreign-to-US calls
► NSA DB can only be queried by individual phone number (seed)
► Analyst queries must be approved by small number of NSA officials
1
3
2
1
2
3
Systems (Provably Secure)
25
Systems ► CS2 (C++) ► Microsos Research, 2012 ► Queries: single keyword search ► 16MB email collecDon in 53ms
26
► BlindSeer (C++) [IARPA] ► Columbia & Bell Labs, 2014 ► Queries: boolean ► SyntheDc dataset ► Search Dme ► For (w1 and w2): 250ms ► w1 in 1 docs ► w2 in 10K docs
Systems
27
► IBM-UCI (C++) [IARPA] ► IBM Research & UC Irvine, 2013 ► Queries: conjuncDve ► 1.3GB email collecDon ► Search Dme ► For (w1 and w2): 5ms ► w1 in 15 docs ► w2 in 1M docs
► Clusion (Java) ► Brown & Colorado St., 2016 ► Queries: Boolean ► 1.3GB email collecDon ► Search Dme ► For (w1 or w2) and (w3 or w4) in 1.5ms
► (w1 or w2) in 10 docs ► (w3 or w4) in 1M docs
Systems ► GRECS ► Microsos Research, Boston U., Harvard & Ben Gurion, 2015 ► Queries: (approximate) shortest distance on graphs ► 1.6M nodes & 11M edges ► Query Dme: 10ms
28
Conclusions ► ExciDng and acDve area of research
► Big potenDal impact in pracDce
► Lots of new research direcDons in theory and systems
► PotenDal for collaboraDon between many areas of CS ► Algorithms and data structures ► Databases ► InformaDon retrieval ► Combinatorial opDmizaDon ► StaDsDcs
29
30