Download - Privacy Definitions: Beyond Anonymity
![Page 1: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/1.jpg)
Lecture 5 : 590.03 Fall 12 1
Privacy Definitions: Beyond Anonymity
CompSci 590.03Instructor: Ashwin Machanavajjhala
![Page 2: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/2.jpg)
Lecture 5 : 590.03 Fall 12 2
Announcements• Some new project ideas added
• Please meet with me at least once before you finalize your project (deadline Sep 28).
![Page 3: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/3.jpg)
Lecture 5 : 590.03 Fall 12 3
Outline• Does k-anonymity guarantee privacy?
• L-diversity
• T-closeness
![Page 4: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/4.jpg)
4
Hospital
DB
Publish properties of {r1, r2, …, rN}
Patient 1r1
Patient 2r2
Patient 3r3
Patient NrN
Publish information that:• Discloses as much statistical information as possible.• Preserves the privacy of the individuals contributing the
data.
Data Publishing
![Page 5: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/5.jpg)
5
Zip Age Nationality Disease
13053 28 Russian Heart
13068 29 American Heart13068 21 Japanese Flu13053 23 American Flu14853 50 Indian Cancer14853 55 Russian Heart14850 47 American Flu14850 59 American Flu13053 31 American Cancer13053 37 Indian Cancer13068 36 Japanese Cancer13068 32 American CancerPublic Information
Quasi-Identifier
Privacy Breach: linking identity to sensitive info.
![Page 6: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/6.jpg)
6
k-Anonymity using Generalization Quasi-identifiers (Q-ID)
can identify individuals in the population
table T* is k-anonymous if each SELECT COUNT(*) FROM T* GROUP BY Q-ID is ≥ k
Parameter k indicates “degree” of anonymity
Zip Age Nationality Disease
130** <30 * Heart130** <30 * Heart130** <30 * Flu130** <30 * Flu1485* >40 * Cancer1485* >40 * Heart1485* >40 * Flu1485* >40 * Flu130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer
![Page 7: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/7.jpg)
7
k-Anonymity: A popular privacy definitionComplexity
– k-Anonymity is NP-hard– (log k) Approximation Algorithm exists
Algorithms– Incognito (use monotonicity to prune generalization lattice)– Mondrian (multidimensional partitioning)– Hilbert (convert multidimensional problem into a 1d problem)– …
![Page 8: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/8.jpg)
8
Does k-Anonymity guarantee sufficient privacy ?
![Page 9: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/9.jpg)
9
Attack 1: Homogeneity
Bob has Cancer
Zip Age Nat. Disease
130** <30 * Heart130** <30 * Heart130** <30 * Flu130** <30 * Flu1485* >40 * Cancer1485* >40 * Heart1485* >40 * Flu1485* >40 * Flu130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer
Name
Zip Age Nat.
Bob 13053 35 ??
![Page 10: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/10.jpg)
10
Zip Age Nat. Disease
130** <30 * Heart130** <30 * Heart130** <30 * Flu130** <30 * Flu1485* >40 * Cancer1485* >40 * Heart1485* >40 * Flu1485* >40 * Flu130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer
Attack 2: Background knowledge
Name Zip Age Nat.Umeko 1306
824 Japan
![Page 11: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/11.jpg)
11
Zip Age Nat. Disease
130** <30 * Heart130** <30 * Heart130** <30 * Flu130** <30 * Flu1485* >40 * Cancer1485* >40 * Heart1485* >40 * Flu1485* >40 * Flu130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer
Attack 2: Background knowledge
Name Zip Age Nat.Umeko 1306
824 Japan
Japanese have a very lowincidence of Heart disease.
Umeko has Flu
![Page 12: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/12.jpg)
12
Q: How do we ensure the privacy of published data?
Identify privacy breach
Design a new algorithm to fix the
privacy breach
Method 1: Breach and Patch The MA Governor Breach and
the AOL Privacy Breach caused by re-identifying individuals.
k-Anonymity only considers the risk of re-identification.
Adversaries with background knowledge can breach privacy even without re-identifying individuals.
![Page 13: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/13.jpg)
13
Limitations of the Breach and Patch methodology.
Identify privacy breach
Design a new algorithm to fix the
privacy breach
Method 1: Breach and Patch
1. A data publisher may not be able to enumerate all the possible privacy breaches.
2. A data publisher does not know what other privacy breaches are possible.
![Page 14: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/14.jpg)
14
Q: How do we ensure the privacy of published data?
Identify privacy breach
Design a new algorithm to fix the
privacy breach
Method 1: Breach and Patch
Method 2: Define and Design
Formally specify the privacy model
Derive conditions for privacy
Design an algorithm that satisfies the
privacy conditions
![Page 15: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/15.jpg)
15
Zip Age Nat. Disease
130** <30 * Heart130** <30 * Heart130** <30 * Flu130** <30 * Flu1485* >40 * Cancer1485* >40 * Heart1485* >40 * Flu1485* >40 * Flu130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer130** 30-40 * Cancer
Recall the attacks on k-Anonymity
Bob has Cancer
Name
Zip Age Nat.
Bob 13053 35 ??
Umeko has Flu
Name Zip Age Nat.Umeko 1306
824 Japan
Japanese have a very lowincidence of Heart disease.
![Page 16: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/16.jpg)
16
Zip Age Nat. Disease
1306* <=40 * Heart1306* <=40 * Flu1306* <=40 * Cancer1306* <=40 * Cancer1485* >40 * Cancer1485* >40 * Heart1485* >40 * Flu1485* >40 * Flu1305* <=40 * Heart1305* <=40 * Flu1305* <=40 * Cancer1305* <=40 * Cancer
3-Diverse Table
Bob has ??
Name
Zip Age Nat.
Bob 13053 35 ??
Umeko has ??
Name Zip Age Nat.Umeko 1306
824 Japan
Japanese have a very lowincidence of Heart disease.L-Diversity Principle:
Every group of tuples with the same
Q-ID values has ≥ L distinct sensitive values of roughly equal proportions.
![Page 17: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/17.jpg)
17
L-Diversity: Privacy Beyond K-Anonymity
L-Diversity Principle: Every group of tuples with the same Q-ID values has ≥ L distinct “well represented” sensitive values.
Questions:• What kind of adversarial attacks do we guard against?• Why is this the right definition for privacy?
– What does the parameter L signify?
[Machanavajjhala et al ICDE 2006]
![Page 18: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/18.jpg)
18
Method 2: Define and Design
Formally specify the privacy model
Derive conditions for privacy
Design an algorithm that satisfies the
privacy conditions
1. Which information is sensitive?2. What does the adversary know?3. How is the disclosure quantified?
• L-Diversity
• L-Diverse Generalization
![Page 19: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/19.jpg)
19
Privacy Specification for L-Diversity• The link between identity and attribute value is the sensitive
information. “Does Bob have Cancer? Heart disease? Flu?” “Does Umeko have Cancer? Heart disease? Flu?”
• Adversary knows ≤ L-2 negation statements. “Umeko does not have Heart Disease.”
– Data Publisher may not know exact adversarial knowledge
• Privacy is breached when identity can be linked to attribute value with high probability Pr[ “Bob has Cancer” | published table, adv. knowledge] > t
Individual u does not have a specific disease s
![Page 20: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/20.jpg)
20
Method 2: Define and Design
Formally specify the privacy model
Derive conditions for privacy
Design an algorithm that satisfies the
privacy conditions
1. Which information is sensitive?2. What does the adversary know?3. How is the disclosure quantified?
• L-Diversity
• L-Diverse Generalization
![Page 21: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/21.jpg)
21
Set of all possible worlds
SashaTomUmekoVanAmarBorisCarolDaveBobCharanDaikiEllen
Calculating Probabilities
CancerCancerCancerCancerCancerCancerCancerCancerCancerCancerCancerCancer
HeartHeartFluFluCancerHeartFluFluCancerCancerCancerCancer
HeartFluFluHeartHeartCancerFluFluCancerCancerCancerCancer
FluHeartHeartFluCancerFluHeartFluCancerCancerCancerCancer
HeartFluHeartFluFluHeartFluCancerCancerCancerCancerCancer
…
Every world represents a unique assignment of diseases to individuals
World 1 World 2 World 3 World 4 World 5
![Page 22: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/22.jpg)
22
SashaTomUmekoVanAmarBorisCarolDaveBobCharanDaikiEllen
Calculating Probabilities
CancerCancerCancerCancerCancerCancerCancerCancerHeartFluCancerCancer
HeartHeartFluFluCancerHeartFluFluCancerCancerCancerCancer
HeartFluFluHeartHeartCancerFluFluCancerCancerCancerCancer
FluHeartHeartFluCancerFluHeartFluCancerCancerCancerCancer
HeartFluHeartFluFluHeartFluCancerCancerCancerCancerCancer
Cancer 0Heart 2Flu 2
Cancer 1Heart 1Flu 2
Cancer 4Heart 0Flu 0
…
T* World 1 World 2 World 3 World 4 World 5
Set of worlds consistent with T*Set of all possible worlds
![Page 23: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/23.jpg)
23
SashaTomUmekoVanAmarBorisCarolDaveBobCharanDaikiEllen
Calculating Probabilities
HeartHeartFluFluCancerHeartFluFluCancerCancerCancerCancer
HeartFluFluHeartHeartCancerFluFluCancerCancerCancerCancer
FluHeartHeartFluCancerFluHeartFluCancerCancerCancerCancer
HeartFluHeartFluFluHeartFluCancerCancerCancerCancerCancer
…
T*
Pr[Umeko has Flu| B, T*] = # worlds consistent with B, T* where Umeko has Flu
# worlds consistent with B, T* = 1
B: Umeko.Disease ≠ Heart
Cancer 0Heart 2Flu 2
Cancer 1Heart 1Flu 2
Cancer 4Heart 0Flu 0
Set of worlds consistent with T*Set of worlds consistent with B, T*
World 2 World 3 World 4 World 5
![Page 24: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/24.jpg)
24
SashaTomUmekoVanAmarBorisCarolDaveBobCharanDaikiEllen
Calculating Probabilities
T*
Pr[Umeko has Flu| B, T*] = # worlds consistent with B, T* where Umeko has Flu
# worlds consistent with B, T*
B: Umeko.Disease ≠ Heart
Counting the # worlds consistent with B, T* is tedious.(and is intractable for more complex forms of B)
Cancer 0Heart 2Flu 2
Cancer 1Heart 1Flu 2
Cancer 4Heart 0Flu 0
![Page 25: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/25.jpg)
25
SashaTomUmekoVanAmarBorisCarolDaveBobCharanDaikiEllen
Calculating Probabilities
T*
Pr[Umeko has Flu| B, T*] = # worlds consistent with B, T* where Umeko has Flu
# worlds consistent with B, T*
B: Umeko.Disease ≠ Heart
Theorem:# worlds consistent with B, T* where
Umeko has Flu is
proportional to
# tuples in Umeko’s group who have Flu.
Cancer 0Heart 2Flu 2
Cancer 1Heart 1Flu 2
Cancer 4Heart 0Flu 0
![Page 26: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/26.jpg)
26
We know …• … what the privacy model is.
• … how to compute:Pr[ “Bob has Cancer” | T* , adv. knowledge]
Therefore, in order for privacy, check for each individual u, and each disease s
Pr[ “u has disease s” | T*, adv. knowledge about u] < t
And we are done … ??
Data publisher does not know the adversary’s knowledge about u• Different adversaries have varying amounts of knowledge.• Adversary may have different knowledge about different individuals.
adv. knowledge about u]
NO
![Page 27: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/27.jpg)
27
• Limit adversarial knowledge– Knows ≤ (L-2) negation statements of the form
“Umeko does not have a Heart disease.”• Consider the worst case
– Consider all possible conjunctions of ≤ (L-2) statements
L-Diversity:Guarding against unknown adversarial knowledge.
At least L sensitive values should appear in every group
Cancer 10Heart 5Hepatitis 2Jaundice 1
L = 5
Pr[Bob has Cancer] = 1
![Page 28: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/28.jpg)
28
Guarding against unknown adversarial knowledge
The L distinct sensitive values in each group should be roughly of equal proportions
Cancer 1000Heart 5Hepatitis 2Jaundice 1Malaria 1
L = 5
Pr[Bob has Cancer] ≈ 1
• Limit adversarial knowledge– Knows ≤ (L-2) negation statements of the form
“Umeko does not have a Heart disease.”• Consider the worst case– Consider all possible conjunctions of ≤ (L-2) statements
![Page 29: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/29.jpg)
29
Guarding against unknown adversarial knowledge
The L distinct sensitive values in each group should be roughly of equal proportions
Cancer 1000Heart 5Hepatitis 2Jaundice 1Malaria 1
L = 5
Pr[Bob has Cancer] ≈ 1
Let t = 0.75. Privacy of individuals in the above group is ensured if ,
< 0.75 # Cancer # Cancer + # Malaria
![Page 30: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/30.jpg)
30
Theorem:For all groups g, for all s in S, and for all B, |B| ≤ (L-2)
is equivalent to
n(g, s)
Σs’ ε (S\B) n(g, s’)≤ t
n(g, s1)n(g, s1) + n(g, sL) + n(g, sL+1) + … + n(g, sm)
≤ t
n(g, s)… …
s1 s2 s3sL-1 sL sL+1 sm
B = {s2, …, sL-1}
![Page 31: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/31.jpg)
31
Method 2: Define and Design
Formally define privacy
Derive conditions for privacy
Design an algorithm that matches privacy
conditions
1. Which information is sensitive?2. What does the adversary know?3. How is the disclosure quantified?
• L-Diversity
• L-Diverse Generalization
![Page 32: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/32.jpg)
32
Algorithms for L-Diversity • Checking whether T* is L-Diverse is straightforward
– In every group g,– Check the L-Diversity condition.
• Finding an L-Diverse table is a Lattice search problem (NP-complete)
![Page 33: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/33.jpg)
33
Algorithms for L-Diversity • Finding an L-Diverse table is a Lattice search problem (NP-
complete)
Q = Nationality
Zip
<N0, Z0>
<N1, Z0> <N0, Z1>
<N1, Z1> <N0, Z2>
<N1, Z2>
Generalization Lattice
Nationality
Zip
* 1306*
* 1305*
* 1485*
Nationality
Zip
American 130**
Japanese 130**
Japanese 148**
Nationality
Zip
American 1306*
Japanese 1305*
Japanese 1485*
Suppress strictly more information
![Page 34: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/34.jpg)
34
Monotonic functions allow efficient lattice searches.
Theorem: If T satisfies L-Diversity, then any further generalization T* also satisfies L-Diversity.
• Analogous monotonicity properties have been exploited to build efficient algorithms for k-Anonymity. – Incognito– Mondrian – Hilbert
![Page 35: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/35.jpg)
Lecture 5 : 590.03 Fall 12 35
Anatomy: Bucketization Algorithm[Xiao, Tao SIGMOD 2007]
![Page 36: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/36.jpg)
36
L-Diversity: Summary
• Formally specified privacy model.
• Permits efficient and practical anonymization algorithms.
L-Diversity Principle:Each group of tuples sharing the same Q-ID must have at least L distinct sensitive values that are roughly of equal proportions.
![Page 37: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/37.jpg)
37
L-Diversity
Sensitive information
Privacy Breach
Background Knowledge
(c,k) Safety
• Background knowledge captured in terms of a propositional formula over all tuples in the table. • Thm: Any formula can be expressed as a conjunction of implications.• Thm: Though checking privacy given some k implications is #P-hard, ensuring privacy against worst case k implications is tractable.
[M et al ICDE 06][Martin et al ICDE 07]
![Page 38: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/38.jpg)
Lecture 5 : 590.03 Fall 12 38
Background Knowledge• Adversaries may possess more complex forms of background
knowledge– If Alice has the flu, then her husband Bob very likely also has the flu.
• In general, background knowledge can be a boolean expression over individuals and their attribute values.
![Page 39: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/39.jpg)
Lecture 5 : 590.03 Fall 12 39
Background Knowledge
Theorem: Any boolean expression can be written as a conjunction of basic implications of the form:
![Page 40: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/40.jpg)
Lecture 5 : 590.03 Fall 12 40
Disclosure Risk• Suppose you publish bucketization T*,
where, φ ranges over all boolean expressions which can be expressed as a conjunction of at most k basic implications.
![Page 41: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/41.jpg)
Lecture 5 : 590.03 Fall 12 41
Efficiently computing disclosure risk• Disclosure is maximized when each implication is simple.
• Max disclosure can be computed in poly time (using dynamic programming)
![Page 42: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/42.jpg)
42
L-Diversity
Sensitive information
Privacy Breach
Background Knowledge
(c,k) Safety
t-closeness
• Assume that the distribution of the sensitive attribute in the table is public information. • Privacy is breached when distribution of the sensitive attribute in a QID block is “t-close” to the distribution of sensitive attribute in the whole table.
[M et al ICDE 06][Martin et al ICDE 07]
[Li et al ICDE 07]
![Page 43: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/43.jpg)
Lecture 5 : 590.03 Fall 12 43
Bounding posterior probability alone may not provide privacy
• Bob: – 52 years old– Earns 11K– Lives in 47909
• Suppose adversary knows distribution of disease in the entire table. – Pr[Bob has Flu] = 1/9
![Page 44: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/44.jpg)
Lecture 5 : 590.03 Fall 12 44
Bounding posterior probability alone may not provide privacy
• Bob: – 52 years old– Earns 11K– Lives in 47909
• After 3-diverse table is published. – Pr[Bob has Flu] = 1/3
• 1/9 1/3 is a large jump in probability
![Page 45: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/45.jpg)
Lecture 5 : 590.03 Fall 12 45
T-closeness principle
Distribution of sensitive attribute within each equivalence class should be “close” to the distribution of sensitive attribute
in the entire table.
• Closeness is measured using Earth Mover’s Distance.
![Page 46: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/46.jpg)
Lecture 5 : 590.03 Fall 12 46
Earth Mover’s Distance
v1 v2 v3 v4 v5 v1 v2 v3 v4 v5
![Page 47: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/47.jpg)
Lecture 5 : 590.03 Fall 12 47
Earth Mover’s Distance
Distance = Cost of moving mass from v2 to v1 (f21)
v1 v2 v3 v4 v5 v1 v2 v3 v4 v5
![Page 48: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/48.jpg)
Lecture 5 : 590.03 Fall 12 48
Earth Mover’s Distance
Distance = Cost of moving mass from v2 to v1 (f21) + cost of moving mass from v5 to v1 (f51)
If the values are numeric, cost can depend not only on amount of “earth” moved, but also the distance it is moved
(d21 and d51).
v1 v2 v3 v4 v5 v1 v2 v3 v4 v5
![Page 49: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/49.jpg)
Lecture 5 : 590.03 Fall 12 49
Earth Movers Distance
Original probability mass in the two distributions p and q which are being compared
![Page 50: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/50.jpg)
50
L-Diversity
Sensitive information
Privacy Breach
Background Knowledge
(c,k) Safety
t-closeness
[M et al ICDE 06][Martin et al ICDE 07]
[Li et al ICDE 07]
PersonalizedPrivacy
• Protects properties of sensitive attributes (e.g., any stomach related diseases).
[Xiao et al SIGMOD 06]
![Page 51: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/51.jpg)
51
L-Diversity
Sensitive information
Privacy Breach
Background Knowledge
Differential Privacy
• Allows for very powerful adversaries.• Privacy is breached if the adversary can tell apart two tables that differ in one entry based on the output table. • No deterministic anonymization algorithm satisfies differential privacy.
![Page 52: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/52.jpg)
Lecture 5 : 590.03 Fall 12 52
Summary• Adversaries can use background knowledge to learn sensitive
information about individuals even from datasets that satisfy some measure of anonymity
• Many privacy definitions proposed for handling background knowledge– State of the art: Differential privacy (lecture 8)
• Next Class: Simulatability of algorithms
![Page 53: Privacy Definitions: Beyond Anonymity](https://reader035.vdocuments.us/reader035/viewer/2022062316/5681674d550346895ddbfe5f/html5/thumbnails/53.jpg)
Lecture 5 : 590.03 Fall 12 53
ReferencesL. Sweeney, “K-Anonymity: a model for protecting privacy”, IJUFKS 2002A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam, “L-Diversity: Privacy
beyond k-anonymity”, ICDE 2006D. Martin, D. Kifer, A. Machanavajjhala, J. Gehrke, J. Halpern, “Worst Case Background
Knowledge”, ICDE 2007N. Li, T. Li, S. Venkitasubramanian, “T-closeness: privacy beyond k-anonymity and l-
diversity”, ICDE 2007X. Xiao & Y. Tao, “Personalized Privacy Preservation”, SIGMOD 2006