exploring linkability of user reviews
DESCRIPTION
Exploring Linkability of User Reviews. Mishari Almishari and Gene Tsudik University of California, Irvine. Roadmap. Introduction Data Set & Problem Settings Linkability Results & Improvements Discussion Future Work & Conclusion. Motivation. Increasing P opularity of Reviewing Sites - PowerPoint PPT PresentationTRANSCRIPT
Exploring Linkability of User Reviews
Mishari Almishari and Gene Tsudik
University of California, Irvine
Roadmap
1. Introduction2. Data Set & Problem Settings3. Linkability Results &
Improvements4. Discussion5. Future Work & Conclusion
Motivation
Increasing Popularity of Reviewing Sites
Yelp, more than 39M visitors and 15M reviews in 2010
Example
category
Rating
Motivation
Rising awareness of privacy
Motivation
How is it applied?
Traceability/Linkability
Linkability of Ad hoc Reviews
Linkablility of Several Accounts
Goal
Assess the linkability in user reviews
Roadmap
1. Introduction2. Data Set & Problem Settings3. Linkability Results &
Improvements4. Discussion5. Future Work & Conclusion
Data Set
• 1 Million Reviews • 2000 Users• more than 300 reviews
Problem Settings
Problem Settings
IR: Identified RecordIR
IR
IR
IR
AR
AR
AR
AR
AR: Anonymous Record
Problem Formulation
Anonymous Record (AR)
Identified Records (IR’s)
Matching Model
TOP-X LinkabilityX: 1 and 10
1, 5, 10, 20,…60
Problem Settings
Methodologies(1) Naïve Bayesian Model
(2) Kullback-Leibler Divergence (KLD)
Decreasing Sorted List of IRs
Increasing Sorted List of IRs
Maximum-Likelihood Estimation
Tokens
• Unigram:• “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y”• 26 values
• Digram• “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy”• 676 values
• Rating• 5 values
• Category• 28 values
Roadmap
1. Introduction2. Data Set & Problem Settings3. Linkability Results &
Improvements4. Discussion5. Future Work & Conclusion
NB -Unigram
Unigram Results
Anonymous Record Size
Lin
kab
ilit
y R
ati
o
Size 60, LR 83%/ Top-1LR 96% Top-10
Digram Results
NB -Digram
Lin
kab
ilit
y
Rati
o
Anonymous Record Size
Size 20, LR 97%/
Top-1Size10, LR 88%/
Top-1
Improvement (1): Combining Lexical and non-Lexical
onesNB Model
Anonymous Record Size
Lin
kab
ilit
y
Rati
o
Gain, up to 20%
Size 60, 83 % To
96%
Size 30, 60 % To
80%
What about Restricting Identified Record (IR) Size?
NB Model KLD Model
Anonymous Record Size
Lin
kab
ilit
y R
ati
oAnonymous Record
Size
Lin
kab
ilit
y R
ati
o
Affected by IR size
Performed better for smaller IR
Size 20 or less, improved
✔
✔
✔
✔
✖
✖
✖✖
✖
✖
v1 v3v2 v4
v7v5 v6 v8
v9 v10
v11
v12
v13
v14
v15 v1
6
Improvement (2): Matching All IR’s At Once
Matching All Results
Restricted IR Full IR
Anonymous Record Size
Lin
kab
ilit
y R
ati
o
Anonymous Record Size
Lin
kab
ilit
y R
ati
o
Gain, up to 16%
Size 30, From 74% To 90%
Gain, up to 23%Size 20, From 35% To 55%
Improvement (3): For Small IR Size
Changing it to:0.5 + Review Length
Anonymous Record Size
Lin
kab
ilit
y
Rati
o Size 10, 89% To 92%
Size 7, 79% To 84%
Gain up to 5%
Roadmap
1. Introduction2. Data Set & Problem Settings3. Linkability Results &
Improvements4. Discussion5. Future Work & Conclusion
Discussion
o Unigram and Scalabilityo 26 VS 676o 59 VS 676o Less than 10%
o Prolific Userso On the long run, will be prolific
o Anonymous Record Size o A set of 60 reviews, less than 20% of minimum
contribution o Detecting Spam Reviews
Roadmap
1. Introduction2. Data Set & Problem Settings3. Linkability Results &
Improvements4. Discussion5. Future Work & Conclusion
Future Work
o Improving more for Small AR’so Other Probabilistic Modelso Using Stylometry
o Review Anonymizationo Exploring Linkability in other Preference
Databases
Conclusion
o Extensive Study to Assess Linkability of User Reviewso For large set of userso Using very simple features
o Users are very exposed even with simple features and large number of authors
Reviews can be accurately de-anonymized using alphabetical letter distributions
Takeaway Point:
Questions?