data privacy and anonymization

Post on 28-Nov-2014

123 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

In the world of Big Data, there has been a lot of the research into creating efficient algorithms that can help us gain statistical insight from the large databases that record much of our life. However, as our digital footprint becomes larger, many databases that were originally considered anonymous can now be re-identified. How do we make sure that doesn't happen?

TRANSCRIPT

Big Data and Attacks on Privacy: How to Properly Anonymize Social Networks and Databases (and Keep Them That Way)AC 298r Final PresentationRyan Lee and Jeffrey Wang

Obligatory Social Network Stats

http://www.mediabistro.com/alltwitter/files/2013/11/growth-of-social-media-2013.jpg

Uses of Social Data: Research

Bollen et al. (2011). CS109 Harvard Univ.Fall 2013

Christakis & Fowler (2010). Christakis & Fowler (2007).

Uses of Social Data: Marketing

Facebook.com

Bio-Rad

Chang, R., Lee, A., Ghoniem, M., Kosara, R., Ribarsky, W., Yang, J., ... & Sudjianto, A. (2008). Scalable and interactive visual analysis of financial wire transactions for fraud detection. Information visualization, 7(1), 63-76.

Uses of Social Data: Government

Challenge: Privacy

Naive Approach: Anonymization

Name Favorite Pizza Favorite Course

Ryan Lee Supreme AC298r

Jeffrey Wang Pepperoni AC298r

Daniel Weinstock Anchovies AC298r

Naive Approach: Anonymization

Name Favorite Pizza Favorite Course

Ryan Lee Supreme AC298r

Jeffrey Wang Pepperoni AC298r

Daniel Weinstock Anchovies AC298r

Priority: Security

Concern: Digital Footprint

NSA Data Warehouse

Deanonymization is Possible

Sweeny, Fuzziness and Knowledge-based Systems, 2002

Netflix Prize 2

Netflix De-anon: How they did it● 500,000 record dataset was super-sparse

Netflix “Anonymized” DataPublic Data (IMDb, twitter, blogs, etc.)

Match if: time < thresholdmovie rating < threshold

Names

Surnames in Genomic Sequences

TACATA is a real last name...

“Anonymized” Cell Phone Data

de Montjoye, Y. A., Hidalgo, C. A., Verleysen, M., & Blondel, V. D. (2013). Unique in the Crowd: The privacy bounds of human mobility. Scientific reports, 3.

Defenses (lol JK)

K-Anonymity

Sweeny, Fuzziness and Knowledge-based Systems, 2002

A Tough Problem

DOB, Gender, and ZIP Code is enough to uniquely identify 87% of US Citizens

Sweeny, Fuzziness and Knowledge-based Systems, 2002

Solution?

First Last Age Race

Harry Stone 34 African American

John Reyser 36 Caucasian

Beatrice Stone 34 African American

John Delgado 22 Hispanic

Sweeny, Fuzziness and Knowledge-based Systems, 2002

Solution: Suppression and Generalization

First Last Age Race

Harry Stone 34 African American

John Reyser 36 Caucasian

Beatrice Stone 34 African American

John Delgado 22 Hispanic

k=2: Polynomial Solution! (Simplex Matching)k>=3: NP-Hard (Graph Decomposition)

Sweeny, Fuzziness and Knowledge-based Systems, 2002

● Users are ε times less likely to be identified if they chose not to participate in the database

Differential Privacy

Dwork, ICALP, 2002

Anonymity in Social Networks

Peter S. Bearman, James Moody, and Katherine Stovel, Chains of affection: The structure of adolescent romantic and sexual networks, American Journal of Sociology 110, 44-91 (2004).

http://www-personal.umich.edu/~mejn/networks/addhealth.gif

High School Dating Network

Information-rich Network Structure

Backstrom, L., & Kleinberg, J. (2013). Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook. arXiv preprint arXiv:1310.6753.

Attacks on Social Networks

● Passive: Find yourselves● Active: structural steganography

http://www.cse.psu.edu/~asmith/courses/privacy598d/www/lec-notes/Attacking%20Social%20Network%20FINAL.pdf

No isomorphicNo automorphism

Obfuscating Social Networks

Zhou and Pei, KAIS, 2011

Part 1: Construct Min-DFS Tree for Neighborhood

Zhou and Pei, KAIS, 2011

2 Useful Properties

1. Social Networks follow a Power-Law Distribution

2. Social Networks typically have a small diameter (6 degrees of separation)

Step 2: Anonymize Similar Vertices

Zhou and Pei, KAIS, 2011

Step 3: ??? => Step 4: Profit!

Zhou and Pei, KAIS, 2011

thanks

bye

top related