calibrating noise to sensitivity in private data analysis
DESCRIPTION
Calibrating Noise to Sensitivity in Private Data Analysis. Kobbi Nissim BGU. With Cynthia Dwork, Frank McSherry, Adam Smith, Enav Weinreb. x 1. query. x 2. x 3. answer. . San. x n-1. x n. Users (government, researchers, marketers, … ). The Setting. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/1.jpg)
Calibrating Noise to Sensitivity in Private Data Analysis
Kobbi Nissim
BGU
With Cynthia Dwork, Frank McSherry, Adam Smith, Enav Weinreb
![Page 2: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/2.jpg)
The Setting
xn
xn-1
x3
x2
x1 query
answerSan
I just want to learn a few
harmless global statistics
Users(government, researchers,
marketers, …)
Can I combine these to learn
some private info?
x Dn
(n rows each of domain D)
X =
![Page 3: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/3.jpg)
What is privacy?
Clearly we cannot undo the harm done by others
Can we minimize the additional harm while providing utility? Goal: Whether or not I contribute my data
does not affect my privacy
![Page 4: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/4.jpg)
Output Perturbation
xn
xn-1
x3
x2
x1 f
f(x) + noiseSan
random coins¢ ¢ ¢
San Controls:• which functions f• kind of perturbation
![Page 5: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/5.jpg)
When Can I Release f(x) accurately?
Intuition: global information is “insensitive” to individual data and is safe
f(x1,…,xn) is sensitive if changing a few entries can drastically change its value
![Page 6: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/6.jpg)
Talk Outline
A framework for output perturbation based on “sensitivity” Formalize “sensitivity” and relate it to
privacy definitions Examples of sensitivity based analysis New ideas
Basic models for privacy Local vs. global Noninteractive vs. Interactive
![Page 7: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/7.jpg)
Related Work
Relevant work in Statistics, Data mining, Computer Security, Databases Largely: no precise definitions and
analysis of privacy Recently: A foundational approach
[DN03,EGS03,DN04,BDMN05, KMN05 CDMSW05,CDMT05,MS06,CM06,…]
This work extends [DN03,DN04,BDMN05]
![Page 8: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/8.jpg)
Privacy as Indistinguishability
xn
xn-1
x3
x2
x1
x=
query 1answer 1query T
answer T
transcriptT(x)
San
random coins¢ ¢ ¢
query 1answer 1query T
answer T
transcriptT(x')
San
random coins¢ ¢ ¢
x’=
xn
xn-1
x3
x2’x1
Distributions at “distance” <
Differ in 1 row
![Page 9: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/9.jpg)
-Indistinguishability
A sanitizer is -indistinguishable if for all pairs x,x’ Dn which differ on at
most one entry for all adversaries A for all transcripts t
Pr[TA(x) = t]
Pr[TA(x’) = t] e
![Page 10: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/10.jpg)
Semantically Flavored Definitions
Indistinguishability - easy to work with but does not directly say what the adversary can do an learn
“Ideal” semantic definition: Adversary does not change his beliefs about me
Problem: dependencies, e.g. in form of side information Say you know that I am 20 pounds heavier than average
Israeli… You will learn my weight from census results
Whether or not I participate
Ways to get around: Assume “independence” of X1,…,Xn [DN03,DN04,BDMN05] Compare “what A knows now” vs “what A would have learned
anyway” [DM]
![Page 11: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/11.jpg)
Incremental Risk
Suppose adversary has prior “beliefs” about x Probability distribution, r.v. X= (X1,…,Xn)
Given transcript t, adversary updates “beliefs”
according to Bayes’ rule New distribution Xi’| T(X)=t
![Page 12: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/12.jpg)
Incremental Risk
Two options: I participate in census (input = X)
I do not participate (input Yi = X1,…,Xi-1,*,Xi+1,…,Xn )
Privacy: whether I participate or not does not
significantly influence adversary’s posterior beliefs:
For all transcripts t, for all i: X’i |T(X)=t ¼ X’i |T(Yi)=t
SanXSanYi
“Proof:” indistinguishability guarantees that updates are
the same within 1±
Bugger!It’s the same whether you participate or
not
![Page 13: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/13.jpg)
Recall – -Indistinguishability
For all pairs x,x’ Dn s.t. dist(x,x’) = 1 For all transcripts t
Pr[TA(x) = t]
Pr[TA(x’) = t] e
![Page 14: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/14.jpg)
An Example – Sum Queries
xn
xn-1
x3
x2
x1
Pls let me know fA(x)=iA xi
fA(x) + noiseSan
random coins¢ ¢ ¢
x [0,1]n
![Page 15: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/15.jpg)
x 2 [0,1]n fA(x)=iA xi
Can be used as a basis for other tasks: clustering, learning, classification… [BDMN05]
Answer: xi + Y where Y Lap(1/)
Laplace Distribution: h(y) e-|y|
Note: |fA(x)-fA(x’)| 1
Sum Queries – Answering a Query
![Page 16: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/16.jpg)
Property of Lap x,y: h(x)/h(y) e|x-y|
Pr[T(x)=t] e|fA(x)-t|
Pr[T(x’)=t] e|fA(x’)-t|
Pr[T(x)=t] / Pr[T(x’)=t] e | fA(x)- fA(x’)| e
Sum Queries – Proof of -Indistinguishability
max |fA(x)-fA(x’)| = 1
f(x) f(x’)
![Page 17: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/17.jpg)
We chose noise magnitude to cover for max |f(x)-f(x’)|
Sensitivity Sf = max ||f(x)-f(x’)||1
Local Sensitivity LSf(x) = max ||f(x)-f(x’)||1
Sensitivity
f
xn
xn-1
xn
xn-1x3
x2
x1
Sanx=
¢¢ ¢
x3 San
x’=¢ ¢ ¢
x2’x1
f(x) + noise
f
f(x’) + noise
dist(x,x’)=1
dist(x,x’)=1
![Page 18: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/18.jpg)
Calibrating Noise to Sensitivity
xn
xn-1
x3
x2
x1Pls let me know
f (x)
f (x) + Lap(Sf /)San
random coins¢ ¢ ¢
x Dn
h(y) e-/Sf ||y||1
![Page 19: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/19.jpg)
Calibrating Noise to Sensitivity - Why it Works?
Sf = max |f(x)-f(x’)|1
Property of Lap: x,y: h(x)/h(y) e||x-y||1
Pr[T(x)=t] / Pr[T(x’)=t] e / Sf ||fA(x)- fA(x’)||1 e
dist(x,x’)=1 h(y) e-/Sf ||y||1
![Page 20: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/20.jpg)
Main Result
Theorem: If a user U is limited to T adaptive queries of sensitivity Sf
then -indistinguishability if iid noise Lap(SfT/ added to query answers
Same idea works with other metrics and noise Which useful functions are insensitive?
All useful functions should be insensitive… Statistical conclusions should not depend on small
variations in data
![Page 21: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/21.jpg)
Using insensitive functions
Strategies: Use theorem, output f(x) + Lap(Sf /)
Sf may be hard to analyze/compute
Sf high for functions considered ‘insensitive’
Express f in terms of insensitive functions Resulting noise depends on input (in form and
magnitude)
![Page 22: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/22.jpg)
Example - Expressing f in terms of insensitive functions x {0,1}n f(x) = ( xi)2
Sf = n2 - (n-1)2 = 2n-1 af = ( xi)2
+ Lap(2n/) If f(x) << n noise dominates
However f(x) = (g(x))2 where g(x) = xi
Sg=1 Better to query for g
Get ag = xi + Lap(1/) Estimate f(x) as (ag)2
Taking constant results in stddev O( xi)
– (1/ )2
![Page 23: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/23.jpg)
Useful Insensitive functions
Means, variances,… With appropriate assumptions on data
Histograms & contingency tables Singular value decomposition Distance to a property Functions with low query complexity
![Page 24: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/24.jpg)
Histograms/Contingency Tables
x1,…,xn 2 D where D partitioned into d disjoint bins b1,…,bd
h(x) = (v1,…,vd) where vi=|{i : xi bi}| Sh = 2
Changing one value xi changes vector by · 2
Irrespective of d
Add Laplacian with std. dev. 2/ to each count
b1 b2 … b4
Can do that with
sum queries …
![Page 25: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/25.jpg)
Distance to a Property
Say P = set of “good” databases
Distance to P =
min # points in x that must be
changed to make x in P Always has sensitivity 1
Add Laplacian with stdev 1/
Examples: Distance to being clusterable
Weight of minimum cut in graph
Px
distance to P
![Page 26: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/26.jpg)
Approximations with Low Query Complexity
Lemma: Assume algorithm A that randomly samples n points and
Pr[ A(x) f(x) ± ] > (1+)/2 Then Sf · 2
Proof: Consider x,x’ that differ on point i Let Ai be A conditioned on not choosing point i Pr[Ai(x) f(x)± | pt i not sampled] > 1/2 Pr[Ai(x’) f(x’)± | pt i not sampled] > 1/2
point p that is within dist from both f(x), f(x’) Sf · 2
Support of Ai(x)=Ai(x)p
![Page 27: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/27.jpg)
Local Sensitivity Median – typically insensitive, large (global)
sensitivity LSf(x) = max ||f(x)-f(x’)||1
Example: f(x) = min(xi, 10) where xi{0,1} LSf(x) = 1 if xi 10 and 0 otherwise
dist(x,x’)=1
10 n xi
![Page 28: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/28.jpg)
Local Sensitivity – First Attempt Calibrate noise to LSf(x)
Answer query f by f(x) + Lap(LSf(x)/) If x1…x10=1 and x11…xn=0
Answer = 10 + Lap(1/) If x1…x11=1 and x12…xn=0
Answer = 10 Noise magnitude
may be disclosive!
10 n xi
![Page 29: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/29.jpg)
How to Calibrate Noise to Local Sensitivity?
Noise magnitude at a point x depends on LS(y) for all y Dn
N*f = max (LSf(y) e- dist(x,y)) Median
10 n xi
![Page 30: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/30.jpg)
Talk Outline
A framework for output perturbation based on “sensitivity” Formalize “sensitivity” and relate it to
privacy definitions Examples of sensitivity based analysis New ideas
Basic models for privacy Local vs. global Noninteractive vs. Interactive
![Page 31: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/31.jpg)
Models for Data Privacy
Collection and
sanitization
You
Bob
Alice
Users(government, researchers,
marketers, …)
![Page 32: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/32.jpg)
San
Models for Data Privacy – Local vs. Global
Local:
Global:
You
Bob
AliceCollection
and sanitization
Collection and
sanitization
San
San
San
You
Bob
Alice
Including “SFE”
![Page 33: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/33.jpg)
Models for Data Privacy –Interactive vs. Noninteractive
You
Bob
AliceCollection
and sanitization
Interactive:
Noninteractive:
You
Bob
AliceCollection
and sanitization
![Page 34: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/34.jpg)
Models for Data Privacy - Summary
Local (vs. Global) Non central trusted party
Individuals interact directly with (untrusted) user
Individuals control their own privacy
Noninteractive (vs. Interactive) Easier distribution: web site, book, CD, …
More secure: can erase the data once it is processed
Almost all work in statistics, data mining is noninteractive!
![Page 35: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/35.jpg)
Four Basic Models
Local, noninteractive
Global, interactive
Global, noninteractive
Local, interactive ??
incomparable
![Page 36: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/36.jpg)
Interactive vs. Noninteractive
Local, noninteractive
Global, interactive
Global, noninteractive
Local, interactive
![Page 37: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/37.jpg)
Separating Interactive from Noninteractive
Random samples: can compute estimates for many stats (essentially) no need to decide upon queries ahead of time But not private (unless small domain, small sample [CM06])
Interaction: get the power of random samples With privacy! E.g. Sum queries f(x) = i fi(xi) Even chosen adaptively!
Noninteractive schemes seem weaker Intuition: privacy cannot answer all questions ahead of time
(e.g. [DN03]) Intuition: sanitization must be tailored to specific functions
![Page 38: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/38.jpg)
Separating Interactive from Noninteractive
Theorem: If D={0,1}d, then for any private,
noninteractive scheme, many sum queries
cannot be learned,
unless d = o(log n)
Weaker than Interactive
Cannot emulate random sample if data is
complex
![Page 39: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/39.jpg)
Local vs. Global
Local, noninteractive
Global, interactive
Global, noninteractive
Local, interactive
![Page 40: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/40.jpg)
Separating Local from Global
D = {0,1}d for d = (log n) View x as an nd matrix Local: rank(x) has sensitivity 1, can
release with low noise Global: cannot distinguish whether
rank(x) = k or much larger than k For suitable choice of d,n,k
![Page 41: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/41.jpg)
To sum up
Defined privacy in terms of indistinguishability Considered semantic versions of definitions “Crypto” with non-negligible error
How to Calibrate noise to sensitivity and # of queries Seems that useful stats should be insensitive Some commonly used functions have low sensitivity For others – local sensitivity?
Begun to explore the relationships between basic models
![Page 42: Calibrating Noise to Sensitivity in Private Data Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062409/56814e2d550346895dbb9241/html5/thumbnails/42.jpg)
Questions Which useful functions are insensitive?
What would you like to compute? Can we get stronger results using:
Local sensitivity? Computational assumptions? [MS06] Entropy in data?
How to deal with small databases? Privacy in a broader context
Rationalizing privacy and privacy related decisions Which types of privacy? How to decide upon privacy
parameters? … Handling rich data
Audio, Video, Pictures, Text, …