![Page 1: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/1.jpg)
Anonymity and Privacy Issues--- re-identification
Yimeng Zhang
12/4/07
![Page 2: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/2.jpg)
Index
• Views on Privacy of Social Media• Overview of Re-identification• You are What You Say: Privacy Risks of Public
Mentions, Frankowski et al. SIGIR06
![Page 3: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/3.jpg)
Improper Use of Personal Information Online
![Page 4: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/4.jpg)
Top Privacy Concerns
![Page 5: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/5.jpg)
Remaining Anonymous
![Page 6: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/6.jpg)
True Information Provide While Registering
![Page 7: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/7.jpg)
Ability to Remain Anonymous
![Page 8: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/8.jpg)
Importance of Controlling Personal Information
![Page 9: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/9.jpg)
Specifying Who Can ViewPersonal Information
![Page 10: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/10.jpg)
Conclusion
• Around 40% of people would like to remain anonymous on social media or social networking sites
• Most people provide their true personal information while registering
• Most people think it is important to have the control of personal information online
Re-identification Techniques can identify the users of an anonymous dataset
![Page 11: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/11.jpg)
Privacy Loss through Re-identification
• Re-identification: Linkage of datasets with explicit identifiers with datasets without explicit identifiers through common attributes
• Datasets without explicit identifiers– Public data which are made anonymous by users– Public data by research groups (after suitable anonym
izing)– Public data from government agencies (census)
People wish to keep private
![Page 12: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/12.jpg)
Example of Re-identificationPublic by Group Insurance Commission of Massachusetts
Voter register list of Massachusetts purchased with only 20$
Sweeney, 2002
87% of Population in 1990. US are likely to be uniquely identified based on only on Zip, Birth and Sex
![Page 13: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/13.jpg)
The Rebus Form
+ =
Governor’s medical records!
From Frankowski, SIGIR06
![Page 14: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/14.jpg)
Example of face identification
Facebook Friendster
With explicit identified profiles Without explicit identified profiles
Face Recognizer
Gross and Acquisti, WPES 05
Identity violation!
![Page 15: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/15.jpg)
You Are What You Say: Privacy Risks of Public Mentions
Dan Frankowski, Dan Cosley, Shilad Sen, Loren Terveen, John Riedl
University of MinnesotaSIGIR 2006
![Page 16: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/16.jpg)
Main Idea
• People can be identified by their preferences and what they talk about– Reviews of books, movies, songs– Mentions on forums or blogs– Friend list on Facebook– Wish or purchase list on Amazon
• Method for Re-identification– Datasets are represented in Sparse Relation Spaces– Re-identification can be done by matching two Sparse
Relation Spaces
![Page 17: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/17.jpg)
Sparse Relation Space
• Relates people to items• Sparse: have few
relationships recorded per person
• Dataset that can be represented in a Sparse Relation Space is vulnerable
i1 i2 i3 …
p1 X
p2 X
p3 X
…
![Page 18: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/18.jpg)
Research Questions
• Risks of dataset release– What are the risks to user privacy when
releasing a dataset
• Altering the dataset– How can dataset owners alter the dataset to
preserve user privacy
• Self defense– How can users protect their own privacy
![Page 19: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/19.jpg)
Experiment Dataset: MovieLens
Dataset1: Movie Ratings Users do not allow to reveal
Released for research use“Anonymous Dataset”
Dataset2: Movies ReviewsPublic
![Page 20: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/20.jpg)
Feature of the dataset
• Both ratings and mentions follow a power law
• Important feature for real world sparse relation space
Number of ratings of an item by percentile
0
10000
20000
30000
40000
50000
60000
0% 20% 40% 60% 80% 100%Item percentile
Nu
mb
er
of r
atin
gs
Frankowski, SIGIR 06
![Page 21: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/21.jpg)
Evaluation Measure
Ratings
Re-identify Algorithm
Mentions by User t
Top k ratings users ranked by the likelihood they are user t
K-identified: t is in the k users returned by the algorithm
K-identification rate: the fraction of k-identified users
Mentions
![Page 22: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/22.jpg)
Set Intersection Algorithm for Re-identification
• Likely list: Users in the rating database who have rated every movie mentions by user t
• Problem– Users mention movies but do not rate them
![Page 23: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/23.jpg)
TF-IDF Algorithm
• Mentions of a user: vector of the movies the user mentioned
• Ratings of a user: vector of the movies the user rated
• Likelihood: TF-IDF cosine similarity
![Page 24: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/24.jpg)
Scoring AlgorithmScoring:
• emphasize the mentions of rarely rated movies
• de-emphasize the number of ratings a user has
Score for one mention/movie of a user:
Fraction of users who have not rated mention m
Score for a user:
Multiplication of scores for all mentions of this user
![Page 25: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/25.jpg)
Scoring Algorithm with Ratings
• Suppose we have an magic analyzer which can guess the rating of a movie from the mention– Eg. Using the context of that mention
• Algorithms– ExactRating: the analyzer can perfectly determine the rating– FuzzingRaing: the analyzer can guess the rating value within +/-1
![Page 26: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/26.jpg)
Percent of users identified by different algorithms
![Page 27: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/27.jpg)
1-identification rate
![Page 28: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/28.jpg)
RQ2: Altering the dataset
• How can dataset owners alter the dataset they release to preserve user privacy
• Data Suppression– Algorithm: Drop rarely rated movies– Not big problem for industry, but harmful for
research
![Page 29: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/29.jpg)
Dataset level Suppression
Do not work!
![Page 30: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/30.jpg)
RQ3: Self Defence
• How can users protect their own privacy
• Suppression– Not to mention movies rated rarely
• Misdirection– Mention items they have not rated
![Page 31: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/31.jpg)
User Level Suppression
Do not work!
![Page 32: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/32.jpg)
Misdirection
Works when user mention popular items
![Page 33: Anonymity and Privacy Issues --- re-identification Yimeng Zhang 12/4/07](https://reader033.vdocuments.us/reader033/viewer/2022061612/5697c0301a28abf838cdacc8/html5/thumbnails/33.jpg)
Conclusion
• Simple data mining algorithms can identify the users who mention in a sparse relation space and think they are anonymous– Use the algorithms: eg. find paper reviewers
(Future work of Frankowski)– Privacy risks for users on Social Media sites
• Hard to preserve privacies– Don’t reveal your privacies even if it seems to
be anonymous