monomi: practical analytical query processing over encrypted data
TRANSCRIPT
Monomi: Practical Analytical Query Processing over Encrypted Data
Network Security
Dr. Ali Fanian
Mostafa Arjmand
The Presentation Include The Following :
• INTRODUTION• SPLIT CLIENT/SERVER EXECUTION• OPTIMIZATION TECHNIQUES• DESIGNER AND PLANNER• EVALUATION• CONCLUSION
Monomi
• Securely executing analytical query over sensitive data on an untrusted database server
Problem: Want to run queries over data!
Vulnerable databaseTrusted user
Query
Response
“Give me the # of views of all adults by country”
US 1M
Italy 3K
… …
Approach 1: Fully Homomorphic Encryption
• Run any computation over encrypted data• Prohibitive overheads in practice
Slowdown order 10^9× compared to computation
on plaintext data
Approach 2: Specialized Schemes
• Cryptosystems supporting specific operations:• Equality (deterministic) [AES]• Addition [Paillier 99]• Inequality (order preserving) [Boldyreva 09]• Keyword Search [Song 00]
• These operations common in SQL queries…
Practical state of the art: CryptDB
SELECT country_DET, PAILLIER_SUM(views_HOM) FROM users_ENCRYPTEDWHERE age_OPE > 0xDEADBEEFGROUP BY country_DET
Transformed Query:SELECT country, SUM(views) FROM users WHERE age > 18GROUP BY country
Original Query:Deterministic encryption: EqualityOrder preserving encryption: InequalityPaillier cryptosystem: Addition
0xDEADBEEF = Encrypt_OPE(18)
Under attack
DB Servertransformed queryplain query
Stores encryption keys
Applicationdecrypted results encrypted results
Trusted
Encrypted DB
No client computation: CryptDB requires that all computation in a query are supported by a specialized crypto-system
Problem: OLTP ≠ OLAP
• CryptDB is designed for OLTP queries• We are interested in OLAP queries• Queries typically involve more computation• CryptDB can only support 4/22 TPC-H queries
SELECT category, SUM(cost * quantity) AS valueFROM productWHERE made_in = ‘United States’GROUP BY categoryHAVING SUM(cost * quantity) > 1000000ORDER BY value
What happens when we run this query with CryptDB?
SELECT category, SUM(cost * quantity) AS valueFROM productWHERE made_in = ‘United States’GROUP BY categoryHAVING SUM(cost * quantity) > 1000000ORDER BY value
No efficient additive + multiplicative homomorphic cryptosystem
SELECT category, SUM(cost * quantity) AS valueFROM productWHERE made_in = ‘United States’GROUP BY categoryHAVING SUM(cost * quantity) > 1000000ORDER BY value
No efficient additive + order preserving homomorphic cryptosystem
Problem: OLTP ≠ OLAPSELECT category, SUM(cost * quantity) AS valueFROM productWHERE made_in = ‘United States’GROUP BY categoryHAVING SUM(cost * quantity) > 1000000ORDER BY value
Most of the query can be executed on the server, except a few parts
CryptDB can handle queries that involve only computation supported by one of these encryption schemes
But few analytical queries fall in this category (only four out of 22 TPC-H), and for those four, CryptDB incurs significant Overhead
Solution
• Monomi: A new system for practical analytical query processing • Split client/server query execution• Pre-computation + other runtime optimizations• Query planner/designer
Monomi: Can run TPC-H with 1.24x median overhead (vs. plaintext) using these three techniques.
Encryption schemes used by MONOMI
Overall architecture of MONOMI
2 ) SPLIT CLIENT/SERVER EXECUTION
• In order to execute queries that cannot be computed on the server alone, MONOMI partitions the execution of each query
TPC-H query 11
Example split query plan for TPC-H query 11
SELECT ps_partkey,SUM(ps_supplycost * ps_availqty) AS valueFROM partsupp JOIN supplier JOIN nationWHERE n_name = :1GROUP BY ps_partkeyHAVING SUM(ps_supplycost * ps_availqty) > (
SELECT SUM(ps_supplycost * ps_availqty) * 0.0001
FROM partsupp JOIN supplier JOIN nationWHERE n_name = :1 )ORDER BY value DESC;
Can execute on server
• Can’t execute on server
• They need decrypt on clien and execute
Pre-computed (ps_supplycost * ps_availqty) (Deterministic)Execute Group by with pre-computed ps_partkey_det
0xabcdef denotes the deterministic encryption of the argument :1
MONOMI’s encryption schemes cannot support such queries directly over encrypted data
Pseudo-code for GENERATEQUERYPLAN
3 ) OPTIMIZATION TECHNIQUES
• MONOMI’s designer and planner automatically use these techniques as appropriate to achieve good performance• Without requiring the application developer to perform manual query
rewriting
Per-row precomputation
• Executing an operator on the client can require downloading a large amount of intermediate data• SUM(ps_supplycost*ps_availqty) • Paillier encryption cannot perform multiplication of two values• MONOMI employs per-row precomputation• Designer materializes an additional column in a table• These materialized expressions can then be used at query runtime
• MONOMI can materialize an additional column storing encryption of ps_supplycost*ps_availqty
• Which allows the server to compute the entire SUM()• This allows the client to download one encrypted aggregate value,
instead of downloading all ps_supplycost and ps_availqty values
Space-efficient encryption
• Database queries that involve table scans—which are common in analytic workloads—are often bottlenecked by I/O
• Precomputed (o_orderdate_det) • To allow grouping by the extracted year, the value must be encrypted
deterministically, but using the standard AES or Blowfish algorithm would produce a 128- or 64-bit ciphertext even for 8-, 16-, or 32-bit.
• To minimize ciphertext expansion for deterministic encryption of such small data types, MONOMI uses the FFX block cipher mode of operation• Which encrypts n-bit plaintexts to n-bit ciphertexts, as long as n
is less than the block cipher width (e.g., 128 bits for AES).
Grouped homomorphic addition
Conservative pre-filtering
3 ) DESIGNER AND PLANNER
• The optimizations do not always apply to every query• In particular, a greedy application of the techniques or a greedy
execution of all parts of the query on the server does not necessarily yield the best performance and can waste space• The goal of MONOMI’s designer is to decide how to encrypt the data
(physical design). • MONOMI’s planner determines how to best execute queries given a
particular physical design. • This planner is also used at runtime to choose a query plan for a new
query from the application
Input and output
• Input : Query workload Q1,Q2, ...,Qn.• Should be representative of the operations that the user is expecting to perform
over the data• Input : A sample of the data that will be loaded into the database • Is used for estimating statistics about the data• Need not be the exact data that will be eventually loaded on the server
• Input : The user can also specify a space constraint factor S• Controls how much space MONOMI’s designer can consume
• Designer returns to the user a physical design for the server• Is a set of (encrypted) columns to materialize for each table, including pre-
computed and Paillier columns
Algorithm without constraints step 1
• The designer considers all of the operations in Qi, including all of the expressions in the WHERE and HAVING clauses, any ORDER BY, etc.• For each operation, the designer determines what expression and
encryption scheme would allow that operation to execute on the server.• The set of these (value, scheme) pairs for Qi is called EncSeti, and
defined E to be the set of all possible such pairs; that is, EncSeti E .• For example, a WHERE x = :1 clause generates a (x,DET) pair, referring to the
x column, and an ORDER BY x+y clause generates a (x+y,OPE) pair, referring to a precomputed x+y value.
Algorithm without constraints step 2
• The designer invokes the planner • The planner computing power set that contains all subsets of EncSeti• It then constructs an execution plan for Qi for each element of the
power set• Where the execution plan describes what parts of the query would be
executed on the server, and what parts would be executed on the client
Algorithm without constraints step 3
• For each of the execution plans, the planner uses a cost model to estimate how much time it would take to execute that plan.
4 ) Evaluation
• How many TPC-H queries can Monomi run?• Can MONOMI efficiently execute an analytical workload over encrypted data on
an untrusted server?• How much do MONOMI’s optimization techniques, designer, and planner matter
in achieving good performance?• What are the overheads for MONOMI in terms of space and client-side CPU time?
• Setup:• TPC-H scale 10• Postgres 8.4 on Linux 2.6
• 8GB RAM, 16 cores, six 7200 RPM HDDs
Most TPC-H queries supported
• Monomi’s approach handles all TPC-H queries• Our prototype handles 19/22 due to missing SQL features (e.g. views, LIKE
’%foo%bar%’, )
• First system we know of that can do this!• CryptDB only supports 4/22
Execution time of TPC-H queries under various systems
min overhead 1.03x, median overhead 1.24x,
max overhead 2.33x
Space and CPU overheads
Amount of disk space used on the server CPU overhead
Sensitivity to designer input
Number of distinct columns in the TPC-H tables encrypted by MONOM
Security
• To understand the level of security that MONOMI provides, it is important to consider the encryption schemes chosen by MONOMI• The worst is OPE, which reveals order, followed by DET• A common use of OPE is for date fields, which may be less sensitive
• The next weakest scheme, DET• Is used in quite a few columns, but reveals only duplicates, which may be less
of a concern
CONCLUSION
• Monomi: analytics on encrypted data can be made practical!• Techniques:• Split client/server execution• Pre-computation + space-efficient encryption + pre-filtering +…• Planner/designer
THANK YOU FOR LISTENING