yingtongdou(uic) guixiangma(intellabs) philips.yu(uic ...sxie/paper/kdd20slides.pdf[1]m. luca. 2016....

Post on 08-Mar-2021

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Robust Spammer Detection by Nash Reinforcement Learning

Yingtong Dou (UIC) Guixiang Ma (Intel Labs)

ACM SIGKDD’ 20, August 23-27th, Virtual Event, CA, USA

ydou5@uic.edu

Paper: http://arxiv.org/abs/2006.06069Slides: http://ytongdou.com/files/kdd20slides.pdfCode: https://github.com/YingtongDou/Nash-Detect

Sihong Xie (Lehigh)Philip S. Yu (UIC)

1

Outline• Background: review spam and spamming campaign

• Highlight: previous works vs. our works

• Methodology I: practical goals of spammers and defenders

• Methodology II: robust training of spam detectors (Nash-Detect)

• Experiments: the training and deployment performance of Nash-Detect

• Conclusion & Future Works

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

2

Images from https://upserve.com/restaurant-insider/five-key-reasons-shouldnt-buy-yelp-reviews/http://greyenlightenment.com/detecting-fake-amazon-reviews/[1] J. Swearingen. 2017. Amazon Is Filled With Sketchy

Reviews. Here’s How to Spot Them. https://slct.al/2TBXDpT

Fake Reviews are PrevalentBackground

• Near 40% reviews in Amazon are fake[1]

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• Yelp hide suspicious reviews and alert consumers

3

• Dishonest merchants can easily buy high-quality fakereviews online

Images from https://mopeak.com/buy-android-reviews/http://faculty.cs.tamu.edu/caverlee/pubs/kaghazgaran19cikm.pdf

[1] P. Kaghazgaran, M. Alfifi, and J. Caverlee. 2019. Wide-Ranging Review Manipulation Attacks: Model, Empirical Study, and Countermeasures. In CIKM.

Spamming CampaignBackground

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• Machine-generated fake reviews are very authentic-like[1]

4

Review Spam Detection• To detect fake reviews, three major types of spam detectors

have been proposed

Text-based Detectors Behavior-based Detectors Graph-based Detectors

Background

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

5

Base Spam Detectors

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

MRF-based detector

SVD-based detector

Dense-block-based detector

Behavior-based detector

• GANG• SpEagle

• fBox

• Fraudar

• Prior

6

Previous Works vs. Our WorkBackground Highlight

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• Our work:• Dynamic game between spammer and defender• Practical evaluation metric• Evolving spamming strategies• Multiple detectors ensemble

• Previous works:• Static dataset• Accuracy-based evaluation metric• Fixed spamming pattern• Single detector

8

Turning Reviews into Business Revenues• In Yelp, product’s rating is correlated to its revenue[1]

[1] M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016).

Background Methodology I

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Revenue Estimation& Practical Effect

Highlight

:

• We run five detectors individually against five attacks

• When detector recalls are high (>0.7), the practical effects arenot reduced

10

Practical Effect is Better than Recall

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

0.0 0.2 0.4 0.6 0.8 1.05eFall

0

1

2

3

4

5

3ra

FtiF

al Eff

eFt

YelSChi

6SEagleFrauGarGA1G3riRrfBRx

0.0 0.2 0.4 0.6 0.8 1.05eFall

0

5

10

15

20

25

PraF

tiFal

Effe

Ft

YelS1YC

6SEagleFrauGarGA1GPriRrfBRx

0.0 0.2 0.4 0.6 0.8 1.0ReFall

0

20

40

60

80

100

PraF

tiFal

Effe

Ft

YelSZiS

6SEagleFrauGarGA1GPriRrfBRx

11

Spammer’s Practical Goal

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Methodology I

Spammer’s Goal:

• To promote a product, the practical goal of the spammer is tomaximize the PE.

Highlight

Revenue after attacks

SpammingPractical Effect

Revenue before attacks

Spamming strategy weights

:

12

Defender’s Practical Goal

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Methodology I

Defender’s Goal:

• We combine detector prediction results with the practicaleffect to formulate a cost-sensitive loss

• The defender needs to minimize the practical effect

Highlight

The cost of false negatives

The prediction results of detectorsDetector weights

13

A Minimax-Game Formulation

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Methodology I Methodology II

• The objective function is not differentiable

• Our solution: multi-agent non-cooperative reinforcement learning and SGD optimization

Highlight

Minimax Game Objective:

16

Train a Robust Detector -

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Methodology I Methodology IIHighlight

Nash-Detect

17

Base Spamming Strategies

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

• IncBP: add reviews with minimum suspiciousness based onbelief propagation on MRF

• IncDS: add reviews with minimum densities on graphcomposed of accounts, reviews, and products

• IncPR: add reviews with minimum prior suspicious scorescomputed by behavior features

• Random: randomly add reviews

• Singleton: add reviews with new accounts

• Dataset statistics and spamming attack settings

18

Experimental Settings

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

Dataset # Accounts # Products # Reviews # Controlledelite accounts

# Targetproducts

# Postedfake reviews

YelpChi 38063 201 67395 100 30 450

YelpNYC 160225 923 359052 400 120 1800

YelpZip 260277 5044 608598 700 600 9000

• The spammer controls elite and new accounts

• The defender removes top k suspicious reviews

19

Fixed Detector’s Vulnerability

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

InFB3 InFDS InF35 5DndRP SingletRnSSDPPing StrDtegy

0

1

2

3

4

5

3rDF

tiFDl

EIIe

Ft

FrDudDr

• For a fixed detector (Fraudar), the spammer can switch to thespamming strategy with the max practical effect (IncDS)

20

Nash-Detect Training Process

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

0 10 20 30 40 50ESisRde

0.0

0.1

0.2

0.3

0.4

AttD

cN 0

ixtr

ue 3

DrDP

eter

YelSChiIncB3IncDSInc35

5DndRPSingletRn

0 10 20 30 40 50ESisRde

YelS1YC

0 10 20 30 40 50ESisRde

YelSZiS

• Singleton attack is less effective than other four attacks.

21

Nash-Detect Training Process

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

0 10 20 30 40 50ESisoGe

0.0

0.5

1.0

1.5

2.0

2.5

Det

eFto

r IP

Sort

DnFe

YelSChiSSEDgleFrDuGDrGA1G

3riorIBox

0 10 20 30 40 50ESisoGe

YelS1YC

0 10 20 30 40 50ESisoGe

YelSZiS

• Nash-Detect can find the optimal detector importance smoothly

0 10 20 30 40 50ESisoGe

3.25

3.50

3.75

4.00

4.25

4.50

4.75

5.00

3rDF

tiFDl

Effe

Ft

YelSChi

fBox6SEDgleFrDuGDr

3riorGA1G1Dsh-DeteFt

0 10 20 30 40 50ESisoGe

21

22

23

24

25

26

27

28YelS1YC

0 10 20 30 40 50ESisoGe

95

100

105

110YelSZiS

22

Nash-Detect Training Process

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

• The practical effect of detectors configured by Nash-Detect arealways less than the worst-case performances

23

Nash-Detect Performance in Deployment

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC

EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr

72.5

80.5

88.5

YelSZiS

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC

EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr

72.5

80.5

88.5

YelSZiS

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC

EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr

72.5

80.5

88.5

YelSZiS

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC

EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr

72.5

80.5

88.5

YelSZiS

24

Key TakeawaysConclusion

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• New metric

• New spamming strategies

• New adversarial training algorithm

Background Challenges Methodology I Methodology II Experiments

25

Future Works

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• Investigate the attack and defenses of deep learning spamdetection methods

• Apply the Nash-Detect framework on other review systems andapplications

• Develop advanced attack generation techniques aware of thestates of review system

ConclusionBackground Challenges Methodology I Methodology II Experiments

Robust Spammer Detection by Nash Reinforcement Learning

Yingtong Dou (UIC) Guixiang Ma (Intel Labs)

ACM SIGKDD’ 20, August 23-27th, Virtual Event, CA, USA

ydou5@uic.edu

Paper: http://arxiv.org/abs/2006.06069Slides: http://ytongdou.com/files/kdd20slides.pdfCode: https://github.com/YingtongDou/Nash-Detect

Sihong Xie (Lehigh)Philip S. Yu (UIC)

top related