secure parallel computation on national-scale volumes of data · (sparse) matrix factorization:...
TRANSCRIPT
![Page 1: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/1.jpg)
Secure Parallel Computationon National-Scale Volumes of DataSahar Mazloom, Phi Hung Le, Samuel Ranellucci, S. Dov Gordon
![Page 2: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/2.jpg)
User
Learning on User Data
User
Data Model
Machine Learning Engine
User Data
2
![Page 3: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/3.jpg)
3
“Computation on Encrypted Data”
Goal: Creating methods for parties to jointly compute a function over their inputs while keeping those inputs private.
Secure Multi-Party Computation:
![Page 4: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/4.jpg)
Example of a ML algorithm(Sparse) Matrix Factorization:• De-composition of a sparse
matrix of Ratings (R), into Users’ (U), and Items’ (V) matrices.
Gradient Descend:• An optimization algorithm
used in many machine learning algorithms.
4
R VU ✕⟹
Items
Use
rs
items
Use
rs
d
d
Matrix Factorization for Recommendation Systems
![Page 5: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/5.jpg)
Matrix Factorization using Gradient Descentfor Movie Recommendation
5
R VU ✕⟹
Movies
Use
rs
Movie ProfilesUse
r Pr
ofile
s
d
d
rij ui
vj
𝑟!" ≈ 𝑢! , 𝑣"
Objective function: 𝐿 = min∑(!,")&' 𝑟!," − 𝑢! , 𝑣"( + 𝜇∑!&) 𝑢! ( + 𝜆∑"&* 𝑣"
(
User gradient: 𝛿+! = −2∑"&* 𝑣" 𝑟!" − 𝑢! , 𝑣" + 2𝜇𝑢!Movie gradient: 𝛿," = −2∑!&) 𝑢! 𝑟!" − 𝑢! , 𝑣" + 2𝜆𝑣"
![Page 6: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/6.jpg)
Distributed Graph Parallel Computation
Non-secure Frameworks:MapReduce, GraphLab, PowerGraph [Gonzalez et al. 2012]
Supported algorithms: Matrix Factorization, Histogram, PageRank, Markov Random Field Parameter Learning, Name Entity Resolution, …
Users Movies
6
![Page 7: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/7.jpg)
PowerGraph:Think as a
Vertex!Move computation
to data
GAS model of Operation
7
![Page 8: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/8.jpg)
Secure Graph Parallel Computation
• GraphSC [Nayak et al. SP’15]
• Use Oblivious Sort to hide node degree and edge structure
Users Movies
Complexity:𝑂( 𝐸 + 𝑉 𝐥𝐨𝐠𝟐 𝐸 + 𝑉 )
V vertices, E edges
Running time: 6K users, 4K movies, 1 M Ratings => 13 Hrs
8Threat model: Honest-But-Curious Adversary
![Page 9: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/9.jpg)
Primary Question [Mazloom, Gordon CCS’18]
Can we make secure computation algorithms faster
if we allow something small to be learned?
And prove the leakage is Differentially Private!
9
![Page 10: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/10.jpg)
Differentially-Oblivious Graph Parallel Computation
• OblivGraph [Mazloom, Gordon CCS’18]
• Noisy node degree by adding dummy edges• No. of dummy edges determined by DP parameters• Use Oblivious Sort to hide the edge structure
Users Movies
Shuffle
10V vertices, E’ all edges
Complexity:𝑂( 𝐸4 + 𝛼 𝑉 𝐥𝐨𝐠 𝐸′ + 𝛼 𝑉 ), 𝛼: O !"# $%!"#&
'
![Page 11: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/11.jpg)
OblivGraph [Mazloom, Gordon CCS’18]
input
Alice Bob
users
input input
11
![Page 12: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/12.jpg)
12
OblivShuffle
OblivGather
OblivApply
OblivScatter
Alice Bob
OblivGraph [Mazloom, Gordon CCS’18]
Running time: 6K users, 4K movies, 1 M Ratings => 2 Hrs
Threat model: Honest-But-Curious Adversary
![Page 13: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/13.jpg)
Current Question
Can we make these differentially private secure
computation algorithms even faster?
13
![Page 14: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/14.jpg)
Can we do better?
q Low communication MPC [Gordon et al. Asiacrypt’18]
q Differentially Private Leakage in Secure Computation [Mazloom, Gordon CCS’18]
q Graph Parallel Computation => Constructing an MPC protocol that can
14
MF on 20 million inputs < 6 mins (MovieLens dataset)Histograms on 300 million inputs in only 4 mins (Counting users in each zip code)
Running time: 6K users, 4K movies, 1 M Ratings => 25 Sec
![Page 15: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/15.jpg)
Key playing factors
ü Using 4 computation servers instead of 2 ü Linear Oblivious Shuffle instead of Quasi-Linear OblivShuffleü Fixed-Point Arithmetic Computation instead of Boolean Circuitü Secure against one malicious adversary
15
![Page 16: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/16.jpg)
Challenge 1
16
o The party that access the data should NOTlearn the shuffling pattern
Merging these construction ==> Opportunities and Challenges
ü Partition the tasks between 4 parties:Group 1: Shuffle the dataGroup 2: Access the data
Solution
![Page 17: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/17.jpg)
Challenge 2
17
o Secure against active adversary
ü Verifying the result of each operation to detect cheating behaviorSolution
![Page 18: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/18.jpg)
Alice Bob
OblivShuffle
Verify Gather
Verify Scatter
Charlotte David
Verify Shuffle
OblivGather
OblivApply
OblivScatter18
Malicious-secure 4-party Secure Parallel Computation
OblivApplyCross-Check
Improved by 230X
Improved by 880X
![Page 19: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/19.jpg)
Challenge 3
19
o Fixed-Point Arithmetic Computation
ü Truncation and handling the rounding errorSolution
![Page 20: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/20.jpg)
Cross–Check Verification after Apply phase
21
Alice & Bob compute on some masked values and truncate
Charlotte & David compute on some masked values and truncate
If their results don’t match?
The verification may Fail if data is a decimal value, and it’s NOT because
of malicious behavior!
Abort
inspired by [Gordon et al. 2018]
![Page 21: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/21.jpg)
Implementation Results
22
Implemented in C++, run experiments on AWS
Multiple benchmark algorithms, including Matrix Factorization and Histogram
4 computation servers, 32 cores each, 10 Gbps network
Input size and privacy parameters for different experiments
![Page 22: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/22.jpg)
Run Time on National-Scale Histogram Problem
23
Run time (s) for computing Histogram problem on different input sizes (LAN)Counting people in each zip code
![Page 23: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/23.jpg)
Run Time Large-Scale MF Problem
24
Run time (s) for computing Matrix Factorization problem on real-world dataset, MovieLens on different input sizes for Movie Recommendation
![Page 24: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/24.jpg)
Run Time Comparison with previous works
25
Run time comparison on this work vs. OblivGraph vs. GraphSC. Single iteration of Matrix Factorization on real- world dataset, MovieLenswith 6K users ranked 4K movies with 1M ratings
![Page 25: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/25.jpg)
Summarize
Goal: Learning on large-scale data with security and privacy
• Secure MPC for Privacy Preserving Machine Learning• Secure against one malicious corruption• Leverage Differential Privacy to improve efficiency
26
![Page 26: Secure Parallel Computation on National-Scale Volumes of Data · (Sparse) Matrix Factorization: •De-composition of a sparse matrix of Ratings (R),into Users’ (U), and Items’](https://reader033.vdocuments.us/reader033/viewer/2022052002/60159a99048d6b7a76063fe6/html5/thumbnails/26.jpg)
Thanks!Q&A
27
Secure Parallel Computationon National-Scale Volumes of DataSahar Mazloom, Phi Hung Le, Samuel Ranellucci, S. Dov Gordon
[email protected] is publicly available!