privacy(in(data(analysis/ubuntu(breeze/ python(in(thought/ enthought(canopy/ 8657123450 8657123450...
TRANSCRIPT
-
Privacy(in(Data(Analysis/
Everything(Data/
CompSci(216(Spring(2017/
-
Data(and(____(☜ your(favorite(subject/
-
Where(is(all(this(data(coming(from?/
-
Where(is(all(this(data(coming(from?/
• Census(surveys/• IRS(Records/
• Medical(records/• Insurance(records/
• Search(logs/• Browse(logs/• Shopping(histories/
• Photos/• Videos//
• Smart(phone(Sensors/• Mobility(trajectories/
• …/Very%sensi
tive%inform
ation%…%0
-
Q: What is your responsibility as a data analyst?
A: Ensure privacy and confidentiality of the individuals
whose data you are analyzing.
-
Outline/
• How(privacy(can(be(violated(...(…(despite(best(efforts?((/
• Methods(for(data(analysis(while(limiting(privacy(leakage./
-
What does privacy mean?
-
On Oct 21, 2016 many users in the US could not connect to sites like Twitter …
-
Internet of Things … became a botnet
-
http://www.dailymail.co.uk/sciencetech/article-2211108/Could-phones-camera-secretly-taking-pictures-right-Hackers-use-lens-steal-private-data--build-3D-model-home.html
Cameras everywhere … what could go wrong?
-
Cameras everywhere …
http://www.dailymail.co.uk/sciencetech/article-2211108/Could-phones-camera-secretly-taking-pictures-right-Hackers-use-lens-steal-private-data--build-3D-model-home.html
-
Fixes …
• Encryption/Passwords – Always encrypt data while in transit – Strong passwords (and not abc123)
• Principle of Least Privilege – Only access the sensors/data that you absolutely
need for your task
• Transparence/Privacy Policies – Make sure the user knows what information is being
collected
-
Client Server architecture
• Client: mobile app • Server: a machine that the mobile app talks to – Could be a single computer – Could be a server on the cloud
• Store data collected from various individual phones on the server
-
Security
• Communication between client and server must be encrypted – Otherwise, Man in the Middle Attack
• Data on the server must be protected – Encryption at rest – Strong passwords
-
Encryption
• SSL (secure socket layer) is the encryption standard for transmitting data securely
-
Principle of Least Privilege
• Only access the sensors and data that you need – Design app with functionality that is appropriate for
the purpose. – No need accelerometer access for a survey tool
• Access control – Different users should have different access
privileges based on what they need the data for
-
Fine-grained policies
• App designers must try to minimize collection of sensitive data from sensors
• GPS: can change granularity of location • Accelerometer: Rather than collect raw data,
only collect and store “activities”
-
Transparency/Privacy Policies
-
PrivateEye:(Protecting(visual(secrets/
[Raval(et(al(MobiSys(2016]/
-
But, ensuring privacy is a lot trickier than just privacy policies…
-
Your(privacy(can(be(leaked(in(nonV
trivial(ways!(/
-
Example(1:(Targeted(Advertising/
http://graphicsweb.wsj.com/documents/divSlider/media/ecosystem100730.png
-
What(websites(track(your(
behavior?/
http://blogs.wsj.com/wtk/
-
Does(it(maZer(…(I(am(anonymous,(right?/
Source((hZp://xkcd.org/834/)/
What%if%we%ensure%our%names%and%other%%identifiers%are%never%released?%5
-
Is(your(browser(safe(against(tracking?(/Is your browser safe against tracking?9
source: http://panopticlick.com
source:(hZp://panopticlick.com/
-
Example 2: Regulations explicitly disallow re-identification of individual records
Title 13
-
HIPAA safe harbor
• Names • Location • All elements of dates except years • Telephone numbers • Fax numbers • Email addresses • Social security numbers • Medical record numbers • Health plan beneficiary numbers • Account numbers
• Certificate/license numbers • Vehicle identifiers and serial
numbers including license plates • Device identifiers and serial
numbers • Web URLs • Internet protocol addresses • Biometric identifies (i.e. retinal
scans, fingerprints) • Photos • Any unique identifying number,
characteristic or code
… must be removed from the data before publication
-
The(MassachuseZs(Governor((
Privacy(Breach([Sweeney(IJUFKS(2002]/
• Name/• SSN/• Visit(Date/• Diagnosis/• Procedure/• Medication/• Total(Charge/
Medical%Data0
• Zip
• Birth date
• Sex
-
The(MassachuseZs(Governor((
Privacy(Breach([Sweeney(IJUFKS(2002]/
• Name/• SSN/• Visit(Date/• Diagnosis/• Procedure/• Medication/• Total(Charge/
• Name/• Address/• Date(((((Registered/
• Party(((((affiliation //
• Date(last((((voted/
• Zip
• Birth date
• Sex
Medical%Data0 Voter%List0
-
The(MassachuseZs(Governor((
Privacy(Breach([Sweeney(IJUFKS(2002]/
• Name/• SSN/• Visit(Date/• Diagnosis/• Procedure/• Medication/• Total(Charge/
• Name/• Address/• Date(((((Registered/
• Party(((((affiliation //
• Date(last((((voted/
• Zip
• Birth date
• Sex
Medical%Data0 Voter%List0
• (Governor(of(MA((((%uniquely%identified(((((using(ZipCode,((
((((Birth(Date,(and(Sex./
(((((
Name%linked%to%Diagnosis0/
-
The(MassachuseZs(Governor((
Privacy(Breach([Sweeney(IJUFKS(2002]/
• Name/• SSN/• Visit(Date/• Diagnosis/• Procedure/• Medication/• Total(Charge/
• Name/• Address/• Date(((((Registered/
• Party(((((affiliation //
• Date(last((((voted/
• Zip
• Birth date
• Sex
Medical%Data0 Voter%List0
• (Governor(of(MA((((%uniquely%identified(((((using(ZipCode,((
((((Birth(Date,(and(Sex./
(((((
0/
Quasi%Identifier0
87(%(of(US(population/
-
Example(3:(AOL(data(publishing(fiasco/
-
AOL(data(publishing(fiasco(…/
Ashwin2220Ashwin2220Ashwin2220Ashwin2220Jun1560Jun1560BreH123450BreH123450BreH123450BreH123450Austin2220Austin2220
Uefa(cup/
Uefa(champions(league/
Champions(league(final/
Champions(league(final(2013/
exchangeability/
Proof(of(deFiniZi’s(theorem/
Zombie(games/
Warcraft/
Beatles(anthology/
Ubuntu(breeze/
Python(in(thought/
Enthought(Canopy/
-
User(IDs(replaced(with(random(
numbers/
Uefa(cup/
Uefa(champions(league/
Champions(league(final/
Champions(league(final(2013/
exchangeability/
Proof(of(deFiniZi’s(theorem/
Zombie(games/
Warcraft/
Beatles(anthology/
Ubuntu(breeze/
Python(in(thought/
Enthought(Canopy/
865712345086571234508657123450865712345023671290902367129090112765410011276541001127654100112765410086571234508657123450
-
Privacy(Breach/[NYTimes)2006]
-
Example(4:(Privacy(violations(in(
social(networks/
hZp://article.wn.com/view/2012/08/28/
Facebooks_new_app_bazaar_violates_punters_privacy
_lobbyists//
-
Inference(from(Impressions:((
Sexual(Orientation(/[Korolova(JPC(2011]/
Facebook%Profile0
+0Online%Data0
Number%of%%Impressions0
(+(Who(are(
interested(in(
Men0
(+(Who(are(
interested(in(
Women0
25
0
Facebook(uses(private(information(to(predict(match(to(ad/
-
Reason(for(Privacy(Breach/
• Anyone(can(run(a(campaign(with(strict(targeting(criteria/
– Zip,(birthdate(and(sex(uniquely(identify(87%(of(US(population/
• “Private”(and(“Friends(only”(profile(info(used(to(determine(match/
• Default(privacy(seZings(lead(to(users(having(many(publicly(visible(features/
– Default(privacy(seZing(for(Likes,(location,(work(place,(etc.(is(public/
-
Can(Facebook(release(its(graph(?/
• Suppose(we(release(just(release(the(nodes(and(edges(in(the(Facebook(graph(…/
-
Mobile(communication(
networks((
[J.(Onnela(et(al.(PNAS(07]/
Sexual(&(Injection(Drug(
Partners(
[PoZerat(et(al.(STI(02]/
-
Naïve(anonymization/
/
• Consider(the(above(email(communication(graph/– Each(node(represents(an(individual/– Each(edge(between(two(individuals(indicates(that(they(have(exchanged(emails/
• Replace(node(identifiers(with(random(numbers.((/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• Alice(has(sent(emails(to(three(individuals(only/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• Alice(has(sent(emails(to(three(individuals(only(/• Only(one(node(in(the(anonymized(network(has(a(degree(three/
• Hence,(Alice(can(reVidentify(herself/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• Cathy(has(sent(emails(to(five(individuals/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• Cathy(has(sent(emails(to(five(individuals/• Only(one(node(has(a(degree(five/• Hence,(Cathy(can(reVidentify(herself/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• Now(consider(that(Alice(and(Cathy(share(their(knowledge(about(the(anonymized(network/
• What(can(they(learn(about(the(other(individuals?/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• First,(Alice(and(Cathy(know(that(only(Bob(have(sent(emails(to(both(of(them/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• First,(Alice(and(Cathy(know(that(only(Bob(have(sent(emails(to(both(of(them/
• Bob(can(be(identified/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• Alice(has(sent(emails(to(Bob,(Cathy,(and(Ed(only/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• Alice(has(sent(emails(to(Bob,(Cathy,(and(Ed(only/
• Ed(can(be(identified/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks(on(Naïve(Anonymization/
• Alice(and(Cathy(can(learn(that(Bob(and(Ed(are(connected/
Alice/
Ed/
Bob/
Fred/
Cathy/
Grace/
Diane/
-
AZacks/
-
Local(structure(is(highly(identifying/
Node%Degree0 Neighbor’s%Degree0
Well%Protected0
Uniquely%Identified0
[Hay(et(al(PVLDB(08]/
Friendster(Network(
~(4.5(million(nodes/
-
Sensitive(values(in(social(networks/
http://mattmckeon.com/"facebook-privacy/
-
Sensitive(values(in(social(networks(/
• Some(people(are(privacy(conscious((like(you)(/
• Most(people(are(lazy(and(keep(the(default(privacy(seZings((i.e.,(no(privacy)/
• Can(infer(your(sensitive(aZributes(based(on(the(sensitive(aZribute(of(public(
individuals(…/
-
Many(many(more(examples(of(privacy(violations/
• GWAS(studies/• Shopping(histories/• Location(trajectories/• …/
-
Why(care(about(privacy?/
• Redlining:(the(practice(of(denying,(or(charging(more(for,(services(such(as(banking,(insurance,(access(to(health(
care,(or(even(supermarkets,(or(denying(jobs(to(residents(
in(particular,(often(racially(determined,(areas./
-
Outline/
• How(privacy(can(be(violated(...(…(despite(best(efforts?((/
• Methods(for(data(analysis(while(limiting(privacy(leakage./
-
Private(data(analysis(problem/
Individual(1/
r1"Individual(2/
r2"Individual3/
r3"Individual(N0rN"
Server/
DB"
Utility:0Privacy:%No(breach(about(any(individual0
-
Private(data(analysis(examples/
Application0 Data%Collector0
Third%Party%(adversary)0
Private%Informatio
n0
Function%(utility)0
Medical/ Hospital/ Epidemiologi
st/
Disease/ Correlation(between(
disease(and(geography/
Genome(
analysis/
Hospital/ Statistician/(
Researcher/
Genome/ Correlation(between(
genome(and((disease/
Advertising/ Google/FB/
Y!/
Advertiser/ Clicks/
Browsing/
Number(of(clicks(on(an(
ad(by(age/region/gender(
…/
Social(
RecommenV
dations/
Facebook/ Another(user/ Friend(
links(/(
profile/
Recommend(other(users(
or(ads(to(users(based(on(
social(network/
Location(
Services/
Verizon/
AT&T/
Verizon/
AT&T/
Location/ Local(Search(/
-
Privacy(goals/
• Avoiding(Linkage(aZacks/– KVanonymity/
• Protecting(against(background(knowledge/– LVdiversity/
/
• Composition(and(privacy(with(multiple(data(releases/
– Differential(Privacy/
-
KVAnonymity/
• If(every(row(corresponds(to(one(individual,(then(…(
(
…(every(row(should(look(like(kV1(other(
rows(based(on(the(quasi7identifier(aZributes/
-
KVAnonymity/
Zip Age Nationality Disease
13053 28 Russian Heart
13068 29 American Heart
13068 21 Japanese Flu
13053 23 American Flu
14853 50 Indian Cancer
14853 55 Russian Heart
14850 47 American Flu
14850 59 American Flu
13053 31 American Cancer
13053 37 Indian Cancer
13068 36 Japanese Cancer
13068 32 American Cancer
Zip Age Nationality Disease
130** 40 * Flu
130** 30-40 * Cancer
130** 30-40 * Cancer
130** 30-40 * Cancer
130** 30-40 * Cancer
-
KVanonymity(in(graphs/
-
Problem:(Homogeneity/
Zip Age Nationality Disease
130** 40 * Flu
130** 30-40 * Cancer
130** 30-40 * Cancer
130** 30-40 * Cancer
130** 30-40 * Cancer
(((/
/
Bob(has(Cancer(/
Name Zip Age Nat. Bob 13053 35 ??
-
Problem:(Background(Knowledge/
-
3VDiverse(Table/
-
3VDiverse(Table/
-
LVDiversity/
• L^diversity%Principle:(Every%group%of%tuples%with%the%same%Q7ID%values%has%≥%L%distinct%“well%represented”%sensitive%values.%5
• The(link(between(identity(and(aZribute(value(is(the(sensitive(information.(((“Does%Bob%have%cancer?%Heart%disease?%Flu?”%“Does%Umeko%have%cancer?%Heart%disease?%Flu?”%/
• Privacy(is(breached(when(the(aZribute(value(can(be(inferred(with(high(probability.(((Pr[“Bob(has(cancer”(|(published(table,(adv.(knowledge](>(t(/
[Machanavajjhala(et(al(ICDE(2006]/
ICDE(Influential(Paper(award(2017/
-
Problem:(Composition/
-
Example: HCUPnet
-
#Hospital discharges in NJ of ovarian cancer patients, 2009
Age #discharges
White Black Hispanic
Asian/ Pcf Hlnder
Native American
Other Missing
#discharges
735 535 82 58 18 * 19 22
1-17 * * * * * * * *
18-44 70 40 13 * * * * *
45-64 330 236 31 32 * * 11 *
65-84 298 229 35 13 * * * *
85+ 34 29 * * * * * *
-
#Hospital discharges in NJ of ovarian cancer patients, 2009
Age #discharges
White Black Hispanic
Asian/ Pcf Hlnder
Native American
Other Missing
#discharges
735 535 82 58 18 * 19 22
1-17 * * * * * * * *
18-44 70 40 13 * * * * *
45-64 330 236 31 32 * * 11 *
65-84 298 229 35 13 * * * *
85+ 34 29 * * * * * *
Any(count(
-
#Hospital discharges in NJ of ovarian cancer patients, 2009
Age #discharges
White Black Hispanic
Asian/ Pcf Hlnder
Native American
Other Missing
#discharges
735 535 82 58 18 1 19 22
1-17 3 1 * * * * * *
18-44 70 40 13 * * * * *
45-64 330 236 31 32 * * 11 *
65-84 298 229 35 13 * * * *
85+ 34 29 * * * * * *
-
Can(reconstruct(tight(bounds(on(
rest(of(data/
Age #discharges
White Black Hispanic Asian/ Pcf Hlnder
Native American
Other Missing
#discharges
735 535 82 58 18 1 19 22
1-17 3 1 [0-2] [0-2] [0-1] [0] [0-1] [0-1]
18-44 70 40 13 [9-10] [0-6] [0] [0-6] [1-8]
45-64 330 236 31 32 [10] [0] 11 [10]
65-84 298 229 35 13 [2-8] [1] [2-8] [4-10]
85+ 34 29 [1-3] [1-4] [0-1] [0] [0-1] [0-1]
[Vaidya et al AMIA 2013]
-
Can(reconstruct(tight(bounds(on(
rest(of(data/
Age #discharges
White Black Hispanic Asian/ Pcf Hlnder
Native American
Other Missing
#discharges
735 535 82 58 18 1 19 22
1-17 3 1 [0-2] [0-2] [0-1] [0] [0-1] [0-1]
18-44 70 40 13 [9-10] [0-6] [0] [0-6] [1-8]
45-64 330 236 31 32 [10] [0] 11 [10]
65-84 298 229 35 13 [2-8] [1] [2-8] [4-10]
85+ 34 29 [1-3] [1-4] [0-1] [0] [0-1] [0-1]
[Vaidya et al AMIA 2013]
Infact,(when(linked(with(queries(giving(other(
statistics,(we(can(figure(out(that(exactly(1(Native(
American(woman(diagnosed(with(ovarian(cancer(
went(to(a(privately(owned,(not(for(profit,(teaching(
hospital(in(new(Jersey(with(more(than(435(beds(in(
2009.(Furthermore,(the(woman(did(not(pay(by(
private(insurance,(had(a(routine(discharge,(with(a(
stay(in(the(hospital(of(33.5(days,(with(her(home(residence(being(in(a(county(with(1(million(plus(
residents((large(fringe(metro,(suburbs),(and(her(age(
was(exactly(75(years./
-
A Lower Bound
• In order to ensure utility, a statistical database must leak some information about each individual
• We can only hope to bound the amount of disclosure
• Hence, there is a limit on number of queries that can be released
[Dinur(Nissim(PODS(2003](
Test(of(Time(award(2013/
-
Rethinking the way we release private data • A Privacy Budget – Analysts are given a fixed amount of privacy budget – Can ask questions about the data as long as they do
not expend the privacy budget – Once the budget is expended, any additional queries
on the data may violate privacy
-
Differential Privacy
• A mechanism to enforce privacy budgets
• Adding or remove a single record to the input database does not affect the output of the computation by more than a function of
• is called the privacy budget
[Dwork(et(al(TCC(2006](
Godel(Award(2017/
-
Differential(Privacy/
• Consider(two(datasets(/– With(Bob(as(one(of(the(participants/– Without(Bob/
• Answers(are(roughly(the(same(whether(or(not(Bob(is(in(the(data/
-
Differential(Privacy/
Algorithm(A(satisfies(εVdifferential(privacy(if:(/
/For(every%pair%of(neighboring%tables%D1,%D2%%%5For(every%output%O//
Pr[A(D1)(=(O](≤(eε(Pr[A(D2)(=(O]/
-
Meaning(…/
D2"
D1"
Set%of%all%%outputs0
.0
.0
.0
A(D1)%=%O10
P%[%A(D1)%=%O1%]%%0
P%[%A(D2)%=%Ok%]%%0
Bob%in%the%%data0
Bob%not%in%the%data0
-
Meaning(…/
.0
.0
.0
Worst%discrepancy%in%probabilities0
D2"
D1"
O1"
-
Privacy(loss(parameter(ε/
Algorithm(A(satisfies(εVdifferential(privacy(if:(/
/For(every%pair%of(neighboring%tables%D1,%D2%%%5For(every%output%O//
Pr[A(D1)(=(O](≤(eε(Pr[A(D2)(=(O]/
/
• Smaller(the(ε(more(the(privacy((and(beZer(the(utility)/
/
-
Differential(Privacy/
Algorithm(A(satisfies(εVdifferential(privacy(if:(/
/For(every%pair%of(neighboring%tables%D1,%D2%%%5For(every%output%O//
Pr[A(D1)(=(O](≤(eε(Pr[A(D2)(=(O]/
0what-the-adversary-learns-about-an-
individual-is-the-same-even-if-the-individual-is-not-in-the-data-(or-lied-about-his/her-value)"
-
Composition
• If k statistics are revealed each while satisfying
-differential privacy then the overall computation satisfies k differential privacy.
-
Applications(in(the(real(world/
Released(synthetic(data(about(
where(people(live(and(work(
under(differential(privacy/
[Machanavajjhala%et%al,%ICDE%2008]5[Haney%et%al,%SIGMOD%2017]5
-
Algorithm(1:(Laplace(Mechanism/
00
0.20
0.40
0.60
^100 ^80 ^60 ^40 ^20 00 20 40 60 80 100
Laplace%Distribution%–%Lap(λ)0
Database0
Researcher0
Query%q0
True%answer%q(D)0
q(D)%+%η0
η0
h(η)(α(exp(Vη(/(λ)/
Privacy(depends(on(
the(λ(parameter/
Mean:(0,((
Variance:(2(λ2/
92/
-
Adoption(in(the(real(world/
Collect(perturbed(data(from(
users(for(analysis./
[Erlingsson%et%al,%CCS%2014]5[Apple%WWDC%2016]5
-
Algorithm(2:(Randomized(Response/
Disease%(Y/N)0Y/
Y/
N/
Y/
N/
N/
With%probability%p,%%%%%%%Report%true%%value00With%probability%1^p,%%%%%%Report%flipped%value0
Disease%(Y/N)0Y/
N/
N/
N/
Y/
N/
Can-estimate-the-true-proportion-of-Y-in-the-data-based-on-the-perturbed-values-(since-we-know-p)"
-
Provable(Privacy(Challenge/
Can(traditional(algorithms(for(releasing/analyzing(
statistical(databases(be(replaced%with%provably%private%algorithms(while(ensuring(liHle%loss%in%utility?/
/
/
/
To(learn(more(…(see(my(other(course(or(talk(to(me(/(my(students/
-
Summary/
• “DataVdriven”(revolution(has(transformed(many(fields(…/
• …(but(it(is(the(analysts(responsibility(to(address(the(privacy(problem/
• Tools(like(differential(privacy(can(foster(`safe’(data(collection,(analysis(and(data(
sharing.(/