-
Implemen'ng Differen'al Privacy & Side-‐channel a8acks
CompSci 590.03 Instructor: Ashwin Machanavajjhala
1 Lecture 17 : 590.03 Fall 13
-
Outline • Differen'al Privacy Implementa'ons
– PINQ: Privacy Integrated Queries [McSherry SIGMOD ‘09] – Airavat: Privacy for MapReduce [Roy et al NDSS ‘10]
• A8acks on Differen'al Privacy Implementa'ons – Privacy budget, state and 'ming a8acks [Haeberlin et al SEC ‘11]
• Protec'ng against a8acks – Fuzz [Haeberlin et al SEC ‘11] – Gupt [Mohan et al SIGMOD ‘12]
Lecture 17 : 590.03 Fall 13 2
-
Differen'al Privacy • Let A and B be two databases such that B = A – {t}.
• A mechanism M sa'sfies ε-‐differen'al privacy, if for all outputs O, and all such A, B
P(M(A) = O) ≤ eε P(M(B) = O)
Lecture 17 : 590.03 Fall 13 3
-
Differen'al Privacy • Equivalently, let A and B be any two databases • Let A Δ B = (A – B) U (B – A) … or the symmetric difference
• A mechanism M sa'sfies ε-‐differen'al privacy, if for all outputs O,
P(M(A) = O) ≤ eε |A Δ B| P(M(B) = O)
Lecture 17 : 590.03 Fall 13 4
-
PINQ: Privacy Integrated Queries • Implementa'on is based on C#’s LINQ language
Lecture 17 : 590.03 Fall 13 5
[McSherry SIGMOD ‘09]
-
PINQ
• An analyst ini'ates a PINQueryable object, which in turn recursively calls other objects (either sequen'ally or in parallel).
• A PINQAgent ensures that the privacy budget is not exceeded.
Lecture 17 : 590.03 Fall 13 6
-
PINQAgent: Keeps track of privacy budget
Lecture 17 : 590.03 Fall 13 7
-
PINQ: Composi'on
• When a set of opera'ons O1, O2, … are performed sequen'ally, then the budget of the en're sequence is the sum of the ε for each opera'on.
• When the opera'ons are run in parallel on disjoint subsets of the data, the privacy budget for the all the opera'ons is the max ε.
Lecture 17 : 590.03 Fall 13 8
-
Aggrega'on Operators
Lecture 17 : 590.03 Fall 13 9
-
Aggrega'on operators Laplace Mechanism • NoisyCount • NoisySum
Exponen1al Mechanism • NoisyMedian • NoisyAverage
Lecture 17 : 590.03 Fall 13 10
-
PINQ: Transforma'on Some'mes aggregates are computed on transforma'ons on the data • Where: takes as input a predicate (arbitrary C# func'on), and
outputs a subset of the data sa'sfying the predicate • Select: Maps each input record into a different record using a C#
func'on
• GroupBy: Groups records by key values
• Join: Takes two datasets, and key values for each and returns groups of pairs of records for each key.
Lecture 17 : 590.03 Fall 13 11
-
PINQ: Transforma'ons Sensi'vity can change once transforma'ons have been applied.
• GroupBy: Removing a record from an input dataset A, can change
one group in the output T(A). Hence, |T(A) Δ T(B)| = 2 |A Δ B|
• Hence, the implementa'on of GroupBy mul'plies ε by 2 before recursively invoking the aggrega'on opera'on on each group.
• Join can have a much larger (unbounded) sensi'vity.
Lecture 17 : 590.03 Fall 13 12
-
Example
Lecture 17 : 590.03 Fall 13 13
-
Outline • Differen'al Privacy Implementa'ons
– PINQ: Privacy Integrated Queries [McSherry SIGMOD ‘09] – Airavat: Privacy for MapReduce [Roy et al NDSS ‘10]
• A8acks on Differen'al Privacy Implementa'ons – Privacy budget, state and 'ming a8acks [Haeberlin et al SEC ‘11]
• Protec'ng against a8acks – Fuzz [Haeberlin et al SEC ‘11] – Gupt [Mohan et al SIGMOD ‘12]
Lecture 17 : 590.03 Fall 13 15
-
Covert Channel • Key assump'on in differen'al privacy implementa'ons:
The querier can only observe the result of the query, and nothing else. – This answer is guaranteed to be differen'ally private.
• In prac'ce: The querier can observe other effects. – E.g, Time taken by the query to complete, power consump'on, etc. – Suppose a system takes 1 minute to answer a query if Bob has cancer and
1 micro second otherwise, then based on query 'me the adversary may know that Bob has cancer.
Lecture 17 : 590.03 Fall 13 16
-
Threat Model • Assume the adversary (querier) does not have physical access to
the machine. – Poses queries over a network connec'on.
• Given a query, the adversary can observe: – Answer to their ques1on – Time that the response arrives at their end of the connec1on – The system’s decision to execute the query or deny (since the new query
would exceed the privacy budget)
Lecture 17 : 590.03 Fall 13 17
-
Timing A8ack Func'on is_f(Record r){ if(r.name = Bob && r. disease = Cancer) sleep(10 sec); // or go into infinite loop, or throw excep3on return f(r);
} Func'on counv(){ var fs = from record in data
where (is_f(record)) print fs.NoisyCount(0.1);
}
Lecture 17 : 590.03 Fall 13 18
-
Timing A8ack Func'on is_f(Record r){ if(r.name = Bob && r. disease = Cancer) sleep(10 sec); // or go into infinite loop, or throw excep3on return f(r);
} Func'on counv(){ var fs = from record in data
where (is_f(record)) print fs.NoisyCount(0.1);
}
Lecture 17 : 590.03 Fall 13 19
If Bob has Cancer, then the query takes > 10 seconds If Bob does not have Cancer, then query takes less than a second.
-
Global Variable A8ack Boolean found = false; Func'on f(Record r){ if(found) return 1; if(r.name = Bob && r.disease = Cancer){ found = true; return 1; } else return 0;
} Func'on counv(){ var fs = from record in data
where (f(record)) print fs.NoisyCount(0.1);
}
Lecture 17 : 590.03 Fall 13 20
-
Global Variable A8ack Boolean found = false; Func'on f(Record r){ if(found) return 1; if(r.name = Bob && r.disease = Cancer){ found = true; return 1; } else return 0;
} Func'on numHealthy(){ var health = from record in data
where (f(record)) print health.NoisyCount(0.1);
}
Lecture 17 : 590.03 Fall 13 21
Typically, the Where transforma1on does not change the sensi1vity of the aggregate (each record transformed into
another value). But, this transforma1on changes the sensi1vity – if Bob has
Cancer, then all subsequent records return 1.
-
Privacy Budget A8ack Func'on is_f(Record r){ if(r.name = Bob && r.disease = Cancer){ run a sub-‐query that uses a lot of the privacy budget; } return f(r);
} Func'on counv(){ var fs = from record in data
where (f(record)) print fs.NoisyCount(0.1);
}
Lecture 17 : 590.03 Fall 13 22
-
Privacy Budget A8ack Func'on is_f(Record r){ if(r.name = Bob && r.disease = Cancer){ run a sub-‐query that uses a lot of the privacy budget; } return f(r);
} Func'on counv(){ var fs = from record in data
where (f(record)) print fs.NoisyCount(0.1);
}
Lecture 17 : 590.03 Fall 13 23
If Bob does not has Cancer, then privacy budget decreases by 0.1. If Bob has Cancer, then privacy budget decreases by 0.1 + Δ.
Even if adversary can’t query for the budget, he can detect the change in budget by coun1ng how many more queries are
allowed.
-
Outline • Differen'al Privacy Implementa'ons
– PINQ: Privacy Integrated Queries [McSherry SIGMOD ‘09] – Airavat: Privacy for MapReduce [Roy et al NDSS ‘10]
• A8acks on Differen'al Privacy Implementa'ons – Privacy budget, state and 'ming a8acks [Haeberlin et al SEC ‘11]
• Protec'ng against a8acks – Fuzz [Haeberlin et al SEC ‘11] – Gupt [Mohan et al SIGMOD ‘12]
Lecture 17 : 590.03 Fall 13 24
-
Fuzz: System for avoiding covert-‐channel a8acks
• Global variables are not supported in this language, thus ruling our state aFacks.
• Type checker rules out budget-‐based channels by sta'cally checking the sensi'vity of a query before they are executed
• Predictable query processor ensures that each microquery takes the same amount of 'me, ruling out Jming aFacks.
Lecture 17 : 590.03 Fall 13 25
-
Fuzz Type Checker
• A primi've is cri'cal if it takes db as an input.
• Only four cri'cal primi'ves are allowed in the language – No other code is allowed.
• A type system that can infer an upper bound on the sensi'vity of any program (wri8en using the above cri'cal primi'ves). [Reed et al ICFP ‘10]
Lecture 17 : 590.03 Fall 13 26
-
Handling 'ming a8acks • Each microquery takes exactly the same 'me T
• If it takes less 'me – delay the query
• If it takes more 'me – abort the query – But this can leak informa1on! – Wrong Solu1on
Lecture 17 : 590.03 Fall 13 27
-
Handling 'ming a8acks • Each microquery takes exactly the same 'me T
• If it takes less 'me – delay the query
• If it takes more 'me – return a default value
Lecture 17 : 590.03 Fall 13 28
-
Fuzz Predictable Transac'on • P-‐TRANS (λ, a, T, d)
– λ : func'on – a : set of arguments – T : Timeout – d : default value
• Implemen'ng P-‐TRANS (λ, a, T, d) requires:
– Isola'on: Func'on λ(a) can be aborted without wai'ng for any other func'on
– Preemptability: λ(a) can be aborted in bounded 'me – Bounded Dealloca'on: There is a bounded 'me needed to deallocate
resources associated with λ(a)
Lecture 17 : 590.03 Fall 13 29
-
Outline • Differen'al Privacy Implementa'ons
– PINQ: Privacy Integrated Queries [McSherry SIGMOD ‘09] – Airavat: Privacy for MapReduce [Roy et al NDSS ‘10]
• A8acks on Differen'al Privacy Implementa'ons – Privacy budget, state and 'ming a8acks [Haeberlin et al SEC ‘11]
• Protec'ng against a8acks – Fuzz [Haeberlin et al SEC ‘11] – Gupt [Mohan et al SIGMOD ‘12]
Lecture 17 : 590.03 Fall 13 30
-
GUPT
Lecture 17 : 590.03 Fall 13 31
-
GUPT: Sample & Aggregate Framework
Lecture 17 : 590.03 Fall 13 32
-
Sample and Aggregate Framework
– S = range of the output – L = number of blocks
Recall from previous lecture: Theorem [Smith STOC ‘09]: Suppose database records are drawn i.i.d. from
some probability distribu'on P, and the es'mator (func'on f) is asympto'cally normal at P. Then if L = o(√n), then the average output by the Sample Aggregate framework converges to the true answer to f.
Lecture 17 : 590.03 Fall 13 33
-
Es'ma'ng the noise • Sensi'vity of the aggrega'on
func'on = S/L – S = range of the output – L = number of blocks
• Sensi'vity is independent of the actual program f
• Therefore, GUPT avoids a>acks using privacy budget as the covert channel.
Lecture 17 : 590.03 Fall 13 34
-
Es'ma'ng the noise
• Sensi'vity of the aggrega'on func'on = S/L – S = range of the output – L = number of blocks
• Output range can be : – Specified by analyst, or – αth and (100 -‐ α)th percen'les can be es'mated using Exponen'al
Mechanism, and a Windsorized mean can be used as the aggrega'on func'on.
Lecture 17 : 590.03 Fall 13 35
-
Handling Global State a8acks • The func'on is computed on each block in an isolated execuJon
environment.
– Analyst sees only the final output, and cannot see any intermediate output or sta'c variables.
– Global variables can’t inflate the sensi'vity of the computa'on (like in the example we saw) … because the sensi'vity only depends on S and L and not on the func'on itself.
Lecture 17 : 590.03 Fall 13 36
-
Handling Timing A8acks Same is in Fuzz … • Fix some es'mate T on the maximum 'me allowed for any
computa'on (on a block) • If computa'on finishes earlier, then wait 'll 'me T elapses • If computa'on takes more 'me, stop and return a default value.
Lecture 17 : 590.03 Fall 13 37
-
Comparing the two systems GUPT
• Allows arbitrary computa'on. But, accuracy is guaranteed for certain es'mators.
• Privacy-‐budget aFack: Sensi'vity is controlled by S (output range) and L (number of blocks) that are sta'cally es'mated
• State aFack: Adversary can’t see any sta'c variables.
• Timing aFack: Time taken across all blocks is predetermined.
FUZZ
• Allows only certain cri'cal opera'ons.
• Privacy-‐budget aFack: Sensi'vity is sta'cally computed.
• State aFack: Global variables are disallowed
• Timing AFack: Time taken across all records is predetermines
Lecture 17 : 590.03 Fall 13 38
-
Summary • PINQ (and Airavat) are frameworks for differen'al privacy that
allow any programmer to incorporate privacy without needing to know how to do Laplace or Exponen'al mechanism.
• Implementa'on can disclose informa'on through side-‐channels – Timings, Privacy-‐budget and State a8acks
• Fuzz and GUPT are frameworks that disallow these a8acks by – Ensuring each query takes a bounded 'me on all records or blocks – Sensi'vity is sta'cally es'mated (rather than dynamically) – Global sta'c variables are either inaccessible to adversary or disallowed
Lecture 17 : 590.03 Fall 13 39
-
Open Ques'ons • Are these the only a8acks that can be launched against a differen'al
privacy implementa'on?
Lecture 17 : 590.03 Fall 13 40
-
Least significant bits and Laplace Mechanism
• Suppose laplace mechanism is implemented using standard floa'ng point,
• Certain outputs are more likely than others
Lecture 17 : 590.03 Fall 13 41
[Mironov CCS ‘12]
-
Least significant bits and Laplace Mechanism
• Suppose laplace mechanism is implemented using standard floa'ng point,
• Certain outputs may not appear
Lecture 17 : 590.03 Fall 13 42
[Mironov CCS ‘12]
-
Least significant bits and Laplace Mechanism
• Suppose laplace mechanism is implemented using standard floa'ng point,
• Both can happen simultaneously
Lecture 17 : 590.03 Fall 13 43
[Mironov CCS ‘12]
-
Least significant bits and Laplace Mechanism
• Sensi'vity computa'on under floa'ng point is also tricky (assume le~ to right summa'on in the following example):
Lecture 17 : 590.03 Fall 13 44
[Mironov CCS ‘12]
-
Open Ques'ons • Are these the only a8acks that can be launched against a differen'al
privacy implementa'on? – No. Laplace mechanism can leak informa'on when implemented using standard
floa'ng point arithme'c
• Current implementa'ons only simple algorithms for introducing privacy – Laplace and Exponen'al mechanisms. Op'mizing error for batches of queries and advanced techniques (e.g., sparse vector) are not implemented. Can these lead to other a8acks?
Lecture 17 : 590.03 Fall 13 45
-
References F. McSherry, “PINQ: Privacy Integrated Queries”, SIGMOD 2009 I. Roy, S. Se8y, A. Kilzer, V. Shma'kov, E. Witchel, “Airavat: Security and Privacy for
MapReduce”, NDSS 2010 A. Haeberlin, B. Pierce, A. Narayan, “Differen'al Privacy Under Fire”, SEC 2011 J. Reed, B. Pierce, M. Gaboardi, “Distance makes types grow stronger: A calculus for
differen'al privacy”, ICFP 2010 P. Mohan, A. Thakurta, E. Shi, D. Song, D. Culler, “Gupt: Privacy Preserving Data Analysis
Made Easy”, SIGMOD 2012 A. Smith, "Privacy-‐preserving sta's'cal es'ma'on with op'mal convergence rates", STOC
2011 I. Mironov, “On significance of the least significant bits for differen'al privacy ppt”, CCS
2012
Lecture 17 : 590.03 Fall 13 46