conclave - aarhus universitet...conclave secure multi-party computation on big data 1 nikolaj...
TRANSCRIPT
![Page 1: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/1.jpg)
ConclaveSecure Multi-Party Computation on Big Data
1Nikolaj Volgushev Malte Schwarzkopf Ben Getchell
Andrei Lapets Mayank Varia Azer Bestavros
Image: National Geographic
![Page 2: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/2.jpg)
Pre-presentation disclaimers/ confessions
• Less focus on developing new MPC protocols
• Instead focus on integrating MPC into big data analytics domain where the current roadblocks are:
- Accessibility (data analysts are not MPC experts)
- Scalability (large data volumes)
• Semi-honest adversaries throughout! 😈
2
![Page 3: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/3.jpg)
🏛#
$🚕
&🚗How concentrated is the market for
hired vehicles in NYC?
&🚙3
Is there healthy competition?
A dangerous monopoly forming?
![Page 4: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/4.jpg)
Secure MPC 🏛#
$🚕
&🚙
🔒
4
A solution: Secure MPC
&🚗
🔒
🔒
🔒
🔒
🔒
How concentrated is the market for hired vehicles in NYC?
📊
🔒
🔒
🔒
🔒?!?
![Page 5: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/5.jpg)
5
PROJECT
SUM
a 11 3a 12 4c 23 1
a 1b 2b 0a 3
JOIN
a 11a 12c 23
a 4b 2
a 11 a 4a 12 a 2
DECLARE TABLE trips (start_lat int, start_lon int, … );SELECT SUM(mkt_share*mkt_share) AS hhi, SUM(trips.price) AS … WHERE start_lat = … AND …;
![Page 6: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/6.jpg)
Secure MPC 🏛#
$
&
🚕
🚗
&🚙
🔒
175M annual trips!
6
![Page 7: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/7.jpg)
Does MPC scale? — Agg: SUM
7
bette
r
N.B.log-scale!
![Page 8: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/8.jpg)
Does MPC scale? — Join
8
bette
r
![Page 9: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/9.jpg)
Our TODO list
• Make MPC more accessible to data analysts
• Make MPC scale for common analytics queries
9
![Page 10: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/10.jpg)
Conclave
10
Key insight:
For most queries, only some of the work musthappen under MPC.
Automatically rewrite query to combine scalable local computation and MPC.
![Page 11: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/11.jpg)
Contributions
11
1. MPC query compilation: relational query compiler that optimizes for efficient MPC.
2. Automated analysis to minimize MPC use while maintaining required guarantees.
3. Hybrid operators: new MPC protocols that give the option to relax privacy requirements to further accelerate expensive operators under MPC.
4. Prototype query compiler implementation using Spark and Sharemind & performance evaluation.
![Page 12: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/12.jpg)
$🚕 &🚗 &🚙
12
Conclavecompiler
Conclavecompiler
Conclavecompiler
DECLARE TABLE trips (start_lat int, start_lon int, … );SELECT SUM(mkt_share*mkt_share) AS hhi, SUM(trips.price) AS … WHERE start_lat = … AND …;
Conclavedispatcher
Conclavedispatcher
Conclavedispatcher
Secure MPC 🔒
![Page 13: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/13.jpg)
# compute the Herfindahl-Hirschman Index (HHI)rev = taxi_data.project(["companyID", "price"]) .sum("local_rev", group=[“companyID”], over="price") .project([0, "local_rev"]) market_size = rev.sum(“total_rev", over=“local_rev") share = rev.join(market_size, left=[“companyID"], right=[“companyID"]) .divide("m_share", "local_rev", by="total_rev") hhi = share.multiply(share, "ms_squared", "m_share") .sum(“hhi", on="ms_squared") hhi.writeToCSV()
13
Relational query specificationParties’ data treated as single relation
![Page 14: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/14.jpg)
import conclave as cc pA, pB, pC = cc.Party(mpc.a.com), […], cc.Party(mpc.c.org) schema = [Column("companyID", cc.INTEGER), … Column("price", cc.INTEGER)] # 3 parties each contribute inputs with the same schemataxi_data = cc.defineTable(schema, at=[pA, pB, pC])
# compute the Herfindahl-Hirschman Index (HHI) rev = taxi_data.project(["companyID", "price"]) .sum("local_rev", group=[“companyID”], over="price") .project([0, "local_rev"]) market_size = rev.sum(“total_rev", over=“local_rev") share = rev.join(market_size, left=[“companyID"], right=[“companyID"]) .divide("m_share", "local_rev", by="total_rev") hhi = share.multiply(share, "ms_squared", "m_share") .sum(“hhi", on="ms_squared") hhi.writeToCSV(to=[pA])
14
Relational query specification
![Page 15: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/15.jpg)
Contributions
15
1. MPC query compilation: relational query compiler that optimizes for efficient MPC.
2. Automated analysis to minimize MPC use while maintaining required guarantees.
3. Hybrid operators: new MPC protocols that give the option to relax privacy requirements to further accelerate expensive operators under MPC.
4. Prototype query compiler implementation using Spark and Sharemind & performance evaluation.
![Page 16: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/16.jpg)
16
🔒
$🚕 &🚗 &🚙
🏛#
SUM🔒
MULTIPLY🔒
DIVIDE🔒
JOIN🔒
PROJECT🔒
CONCATENATE🔒
SUM🔒
DIVIDE by 10k🔒
PROJECT🔒
CONCATENATE🔒
proj(concat(a, b)) = concat(proj(a), proj(b))
![Page 17: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/17.jpg)
17
🔒
$🚕 &🚗 &🚙
🏛#
SUM🔒
MULTIPLY🔒
DIVIDE🔒
JOIN🔒
PROJECT🔒
CONCATENATE🔒SUM🔒
DIVIDE by 10k🔒
PROJECT🔒 PROJECT🔒
PROJECT🔒
proj(concat(a, b)) = concat(proj(a), proj(b))
![Page 18: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/18.jpg)
18
🔒
$🚕 &🚗 &🚙
🏛#
SUM🔒
MULTIPLY🔒
DIVIDE🔒
JOIN🔒
PROJECT
CONCATENATE🔒SUM🔒
DIVIDE by 10k🔒
PROJECT PROJECT
PROJECT🔒
sum(concat(a, b)) = sum(concat(sum(a), sum(b)))
![Page 19: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/19.jpg)
19
🔒
$🚕 &🚗 &🚙
🏛#
SUM🔒
MULTIPLY🔒
DIVIDE🔒JOIN🔒
PROJECT
CONCATENATE🔒
SUM🔒
DIVIDE by 10k🔒
PROJECT PROJECT
PROJECT🔒
SUMSUM SUM
![Page 20: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/20.jpg)
Contributions
20
1. MPC query compilation: relational query compiler that optimizes for efficient MPC.
2. Automated analyses to determine which parts of a query must run under MPC.
3. Hybrid operators: new MPC protocols that give the option to relax privacy requirements to further accelerate expensive operators under MPC.
4. Prototype query compiler implementation using Spark and Sharemind & performance evaluation.
![Page 21: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/21.jpg)
21
🏛+ &💳
$💳(ssn, zip) (ssn, assets) (ssn, assets)
![Page 22: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/22.jpg)
JOIN ON ssn🔒
import conclave as cc pA, pB, pC = cc.Party(“mpc.ftc.gov"), cc.Party(“mpc.a.com"), \ cc.Party(“mpc.b.cash”)demo_schema = [Column(“ssn", cc.INTEGER), Column(“zip”, cc.INTEGER)] demographics = cc.defineTable(demo_schema, at=pA) # credit card companies trust the regulator to compute on SSNs bank_schema = [Column("ssn", cc.INTEGER, trust=[pA]), Column("assets", cc.INTEGER)]scores-1 = cc.defineTable(bank_schema, at=[pB])…
22
Regulator trusted with SSN columns
🏛+ &💳
$💳(ssn, zip) (ssn, assets) (ssn, assets)
CONCATENATE🔒
(ssn)
…🔒
🔒 HYBRID JOIN 🔒
![Page 23: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/23.jpg)
23
Hybrid operators• Two roles in hybrid operator scenario:
- Semi-trusted party (STP) may learn a specific column in the clear; does not collude with other parties
- Untrusted parties may not learn anything
• Goal:
- Outsource expensive sub-steps to STP for local processing
- Without leaking information to untrusted parties
23
![Page 24: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/24.jpg)
24
Complexity (Oblivious)
Complexity (Hybrid)
Bottle-neck operation (Oblivious)
Bottle-neck operation (Hybrid)
Join O(n2)comparisons
O(n+m log (n+m))multiplications
(where m is size of result)
Pair-wise comparison
between all rows
Batched oblivious array access
Aggregation O(n log2 n)comparisons
O(n log n)multiplications Oblivious sort Oblivious shuffle
![Page 25: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/25.jpg)
Evaluation1. How does Conclave scale to increasingly large inputs?
2. How much does automatic MPC frontier placement reduce query runtime?
3. What impact do hybrid operators have on query runtime?
• Three parties3 VM Spark cluster + Sharemind endpoint at each
• Two queries 1. Taxi market concentration: up to 1.3B trip records2. Credit card regulation: up to 100k SSNs
25
![Page 26: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/26.jpg)
26
Taxi market concentration querybe
tter
Five orders of magnitude!
![Page 27: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/27.jpg)
27
Credit card regulation querybe
tter
Four orders of magnitude!
![Page 28: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/28.jpg)
Related work
• Mixed-mode MPC: Wysteria [S&P 2014] — custom DSL
• Query rewriting for MPC
• SMCQL [VLDB 2017]: binary public/private columns, no hybrid operators
• Opaque [NSDI 2017]: computation under SGX, focus on reducing oblivious shuffles
28
![Page 29: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/29.jpg)
Summary• Conclave is a query compiler for efficient MPC on “big
data”
• Automatically shrinks MPC step to be as small as possible
• New hybrid MPC-cleartext protocols speed up operators
• Scales up to 5 orders of magnitude better than pure MPC
https://github.com/multiparty/conclave
29
![Page 30: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/30.jpg)
Conclave Implementation
• Relational front-end
• Rewrite rules on intermediate DAG of operators
• Back-ends generate code
- Cleartext: Spark, sequential Python
- MPC: Sharemind, Obliv-C (partial support)
• ~5,000 lines of Python
30
![Page 31: Conclave - Aarhus Universitet...Conclave Secure Multi-Party Computation on Big Data 1 Nikolaj Volgushev Malte Schwarzkopf Ben Getchell Andrei Lapets Mayank Varia Azer Bestavros Image:](https://reader035.vdocuments.us/reader035/viewer/2022070819/5f1a1ca46d1dc454fb018bac/html5/thumbnails/31.jpg)
31
Hybrid MPC-cleartext operator impact
Join Aggregation