dsybil: optimal sybil-resistance for recommendation systems

30
DSybil: Optimal Sybil- DSybil: Optimal Sybil- Resistance Resistance for Recommendation Systems for Recommendation Systems Haifeng Yu National University of Singapore Chenwei Shi National University of Singapore Michael Kaminsky Intel Research Pittsburgh Phillip B. Gibbons Intel Research Pittsburgh Feng Xiao National University of Singapore

Upload: odele

Post on 14-Jan-2016

47 views

Category:

Documents


4 download

DESCRIPTION

DSybil: Optimal Sybil-Resistance for Recommendation Systems. Haifeng Yu National University of Singapore Chenwei Shi National University of Singapore Michael Kaminsky Intel Research Pittsburgh Phillip B. Gibbons Intel Research Pittsburgh Feng Xiao National University of Singapore. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DSybil: Optimal Sybil-Resistance for Recommendation Systems

DSybil: Optimal Sybil-ResistanceDSybil: Optimal Sybil-Resistancefor Recommendation Systemsfor Recommendation Systems

Haifeng Yu National University of Singapore

Chenwei Shi National University of Singapore

Michael Kaminsky Intel Research Pittsburgh

Phillip B. Gibbons Intel Research Pittsburgh

Feng Xiao National University of Singapore

Page 2: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Attacks on Recommendation SystemsAttacks on Recommendation Systems

Netflix, Amazon, Razor, Digg, YouTube, …

Attacker may cast misleading votes

To be more effective Bribe other users

Compromise other users

Ultimate form: Sybil attack

Haifeng Yu, National University of Singapore 2

Page 3: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Haifeng Yu, National University of Singapore 3

Sybil AttackSybil Attack

launchsybilattack

malicious

“Post at random intervals to make it look like real people”

“Supports multiple random proxies to make posts look like they came from visitors across the world”

“Multithreaded comment blaster with account rotation”

honest automated sybil attack for $147

Page 4: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Haifeng Yu, National University of Singapore 4

Background: Defending Against Sybil AttackBackground: Defending Against Sybil Attack

Tie identities to human beings based on credentials (e.g., passport) Privacy concerns, etc.

Resource challenges Vulnerable to attacks from botnets

Social-network-based defense SybilGuard [SIGCOMM’06], SybilLimit [Oakland’08], SybilInfer

[NDSS’09], SumUp [NSDI’09]

Better guarantees

Sybil defense widely considered challenging: >1000 papers acknowledging sybil attack, most without having a solution

Page 5: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Byzantine consensus

n/3

DHT n/4

… …

Recommendation systems

n/500

Rec Systems Are More VulnerableRec Systems Are More Vulnerable

On an avg Digg object, only 1 out of every 500 honest users vote

n/500 sybil identities are sufficient to out-vote the honest voters

Haifeng Yu, National University of Singapore 5

# sybil identities we can tolerate

(n identities total)

Page 6: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Social-network-based Defenses Not Social-network-based Defenses Not Sufficiently Strong For Rec SystemsSufficiently Strong For Rec Systems

Lower bound on all social-network-based approaches Applicable to SybilGuard, SybilLimit, SybilInfer,

SumUp, etc…

Compromising a degree-10 node creates 10 sybil identities

To create n/500 sybil identities: Compromise 1 node out of every 5000 honest nodes is sufficient

Haifeng Yu, National University of Singapore 6

Page 7: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Alternative: Leverage History and TrustAlternative: Leverage History and Trust

Ancient idea: Adjust “trust” to an identity based on its historical behavior

Numerous heuristics proposed -- target a few fixed attack strategies No guarantees beyond the few strategies targeted

Attacker is intelligent and will adapt arms race

Haifeng Yu, National University of Singapore 7

Page 8: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Our ResultsOur Results DSybil: A novel defense mechanism

Based on feedback and trust

Loss (# of bad recommendations) is provably

even under worst-case attack

We prove that DSybil’s loss is optimal

Experimental results (from 1-year Digg trace): High-quality recommendation under potential sybil

attack (with optimal strategy) from million-node botnet

Haifeng Yu, National University of Singapore 8

)log( MDO

D : Dimension of the objects (< 10 in Digg)

M : Max # of sybil identities voting on each obj

Page 9: DSybil: Optimal Sybil-Resistance for Recommendation Systems

OutlineOutline

Background and our contribution

Trust-based approaches – The obvious, the subtle, and the challenge

Main component of DSybil: DSybil’s recommendation algorithm

Experimental results

Haifeng Yu, National University of Singapore 9

Page 10: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Subtle Aspects of Using TrustSubtle Aspects of Using Trust

1. How to identify “correct” but “non-helpful” votes? Vote for a good object that already has many votes --

this additional vote is “non-helpful”

Sybil identities may gain trust for free

Determine the “contribution amount” by voting order does not work – see paper

Haifeng Yu, National University of Singapore 10

A good object

my vote: this obj is good!

sybil

identity

gain trust for

free

Page 11: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Subtle Aspects of Using TrustSubtle Aspects of Using Trust

2. How to assign initial trust to new identities? Positive initial trust for all: Invites whitewashing

“Trial period of 5 votes” not effective

Cast 5 “correct” votes and then cheat

3. How exactly to grow trust? Multiplicatively? Additively?

4. How exactly to make recommendations? Pick obj with most votes? Probabilistically? How

about negative votes?…..

Haifeng Yu, National University of Singapore 11

Page 12: DSybil: Optimal Sybil-Resistance for Recommendation Systems

The Central ChallengeThe Central Challenge Numerous design choices -- fundamental

tension between Giving trust to honest identities

Not giving trust to sybil identities casting “correct” votes (who may cause damage later)

Impossible to explore all design alternatives Our approach: Directly design an optimal algorithm

Needs to strike the optimal balance

Haifeng Yu, National University of Singapore 12

Page 13: DSybil: Optimal Sybil-Resistance for Recommendation Systems

DSybil’s Key Insights DSybil’s Key Insights

Key #1: Leverage typical

voting behavior of

honest users Heavy-tail distribution

Exist very active users

who cast many votes

Key #2: If user is already getting “enough help”, then do not give out more trust Enables us to strike an optimal balance

Haifeng Yu, National University of Singapore 13

# votes cast (on various objs)

% o

f us

ers

cast

ing

x vo

tes

all log-scale

Page 14: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Objects to be recommended are either good or bad (e.g., Digg)

DSybil is personalized Each user may have different subjective opinions

Different users may get different recommendations

From now on, always with respect to a user Alice

Run by either Alice or a central server

Haifeng Yu, National University of Singapore 14

System Model and Attack ModelSystem Model and Attack Model

Page 15: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Haifeng Yu, National University of Singapore 15

Each round has a pool of objects DSybil recommends one object for Alice to consume Alice provide feedbacks after consumption DSybil adjust trust based on feedback

See paper for generalizations…

2 good objs 2 bad objs

System Model and Attack ModelSystem Model and Attack Model

DSybil does not know which are good

Page 16: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Haifeng Yu, National University of Singapore 16

Other identities have cast votes DSybil only use positive votes

We prove that using negative votes will not help…

Each identity cast at most one vote/object

At most M (e.g. 1010) sybil identities voting on each object

EF

G H

2 good objs 2 bad objs

H

System Model and Attack ModelSystem Model and Attack Model

Page 17: DSybil: Optimal Sybil-Resistance for Recommendation Systems

DSybil Rec Algorithm: Classifying ObjectsDSybil Rec Algorithm: Classifying Objects

Haifeng Yu, National University of Singapore 17

E: 0.2

total : 0.4

F: 0.2

total : 0.2

G: 0.2

total : 0.2

H: 0.2

total : 0.2

2 good objs 2 bad objs

H: 0.2

Reminder: Trust is always with respect to Alice (how much Alice “trusts” the given identity)

Each identity starts with initial trust 0.2 -- Fix later…

An object is overwhelming if total trust ≥ C C = 1.0

Page 18: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Rounds without Overwhelming ObjectsRounds without Overwhelming Objects

1. Recommend: Uniformly random obj

Haifeng Yu, National University of Singapore 18

E: 0.2

total : 0.4

F: 0.2

total : 0.2

G: 0.2

total : 0.2

H: 0.2

total : 0.2

2 good objs 2 bad objs

H: 0.2

trust to G: 0.2 0.2trust to E: 0.2 0.2

trust to F: 0.2 0.2

2. Adjust trust after feedback: If obj bad, multiply trust of voters by 0 ≤ < 1

If obj good, multiply trust of voters by > 1

Additive increase would result in linear (instead of logarithmic) loss…

Recommend obj with largest total trust would result in linear (instead of logarithmic) loss…

Page 19: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Defining Guides and DimensionDefining Guides and Dimension

Haifeng Yu, National University of Singapore 19

Guides: Honest users with same/similar “opinion” with Alice Never/seldom votes for bad objects

Dimension: # of guides needed to “cover” large fraction (e.g., 60%) of the good objects -- Called critical guides

X X YZ

W

Dimension = 2; Critical guides = {X, Y} or {X, W}

DSybil does not know who are the guides (critical guides) or what the dimension is

Page 20: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Key #1: Leverage Small DimensionKey #1: Leverage Small Dimension

Haifeng Yu, National University of Singapore 20

Dimension is typically small in practice – results later…

Small dimension Will encounter critical guides frequently when picking random objects Trust to critical guides quickly grow to C

This will result in overwhelming objects…

Page 21: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Rounds with Overwhelming ObjectsRounds with Overwhelming Objects

1. Recommend: Arbitrary overwhelming obj Will confiscate sufficient trust if object is bad…

2. Adjust trust after feedback: If obj bad, multiply trust of the voters by 0 ≤ < 1

If obj good, no additional trust given out

Haifeng Yu, National University of Singapore 21

E: 1.0

total : 1.2

H: 0.2G: 0.2

total : 0.4

F: 1.0

total : 1.0

1 good obj 2 bad objs

H: 0.2

trust to E: 1.0 1.0

trust to H: 0.2 0.2

Page 22: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Key #2: Identify Whether Help is SufficientKey #2: Identify Whether Help is Sufficient

Consumes good overwhelming object = Alice already has “sufficient help”

Thus do not give out additional trust Prevent sybil identities from getting trust “for free”

May hurt honest identities (But remember this is optimal…)

Haifeng Yu, National University of Singapore 22

Page 23: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Omitted DetailsOmitted Details No “free” initial trust given out when Alice is

getting “sufficient help”

Proof for loss even under worst-case attack

Optimality

Alternative designs/tweaks Most will break optimality

Haifeng Yu, National University of Singapore 23

)log( MDO

Page 24: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Results on DimensionResults on Dimension One-year Digg dataset with half-million users

Pessimistically assuming guides are only 2% of the honest users -- see paper for other settings…

To cover 60% of good objs, need only 3 guides

Robustness: Remove the previous 3 guides – 5 guides to cover 60%

Remove top 100 heaviest voters – 5 guides needed to cover 60%

See paper for more…

Relates to heavy-tail distribution of votes cast by individual users – see paper Exist very active users who cast many votes

Similar heavy-tail distribution observed in 4 other datasets

Haifeng Yu, National University of Singapore 24

Page 25: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Results on Loss (Based on Digg Dataset) Results on Loss (Based on Digg Dataset)

Attack capacity: Max 10 billion sybil voters on any obj In Digg, avg # honest voters on each obj is only ~1,000

Fraction of bad recommendations (under worst-case attack): 12%

Growing defense: 5% if user has used DSybil for a week before attack starts If attack starts at random point, applies to 51/52 = 98%

users

1-minute computational puzzle per week 10 billion identities needs a million-node botnet

Haifeng Yu, National University of Singapore 25

Page 26: DSybil: Optimal Sybil-Resistance for Recommendation Systems

ConclusionConclusion

Defending against sybil attacks is challenging It is even harder in the context of rec systems

DSybil: Provable and optimal loss Almost no previous approaches provide provable

guarantees against worst-case attack

DSybil key insights: Leverage small dimension of the voting pattern

Carefully identify when help is already “sufficient”

Haifeng Yu, National University of Singapore 26

)log( MDO

Page 27: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Haifeng Yu, National University of Singapore 27

Page 28: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Haifeng Yu, National University of Singapore 28

Which object to pick?

Page 29: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Central Question Answered by This WorkCentral Question Answered by This Work

Haifeng Yu, National University of Singapore 29

Can trust sufficiently diminish the influence of sybil identities in recommendation systems?

Aim for provable guarantees under all attack strategies (including worst-case attack from intelligent attacker)

Short answer: YES!

Page 30: DSybil: Optimal Sybil-Resistance for Recommendation Systems

Our ResultsOur Results DSybil: A novel defense mechanism

Growing defense: If the user has used DSybil for some time before the attack starts, loss will be even smaller

Experimental results (from one-year trace of Digg): High-quality recommendation even under potential sybil attack (with optimal strategy) from a million-node botnet

Haifeng Yu, National University of Singapore 30