yingtongdou(uic) guixiangma(intellabs) philips.yu(uic ...sxie/paper/kdd20slides.pdf[1]m. luca. 2016....

23
Robust Spammer Detection by Nash Reinforcement Learning Yingtong Dou (UIC) Guixiang Ma (Intel Labs) ACM SIGKDD’ 20, August 23-27th, Virtual Event, CA, USA [email protected] Paper: http://arxiv.org/abs/2006.06069 Slides: http://ytongdou.com/files/kdd20slides.pdf Code: https://github.com/YingtongDou/Nash-Detect Sihong Xie (Lehigh) Philip S. Yu (UIC)

Upload: others

Post on 08-Mar-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

Robust Spammer Detection by Nash Reinforcement Learning

Yingtong Dou (UIC) Guixiang Ma (Intel Labs)

ACM SIGKDD’ 20, August 23-27th, Virtual Event, CA, USA

[email protected]

Paper: http://arxiv.org/abs/2006.06069Slides: http://ytongdou.com/files/kdd20slides.pdfCode: https://github.com/YingtongDou/Nash-Detect

Sihong Xie (Lehigh)Philip S. Yu (UIC)

Page 2: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

1

Outline• Background: review spam and spamming campaign

• Highlight: previous works vs. our works

• Methodology I: practical goals of spammers and defenders

• Methodology II: robust training of spam detectors (Nash-Detect)

• Experiments: the training and deployment performance of Nash-Detect

• Conclusion & Future Works

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Page 3: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

2

Images from https://upserve.com/restaurant-insider/five-key-reasons-shouldnt-buy-yelp-reviews/http://greyenlightenment.com/detecting-fake-amazon-reviews/[1] J. Swearingen. 2017. Amazon Is Filled With Sketchy

Reviews. Here’s How to Spot Them. https://slct.al/2TBXDpT

Fake Reviews are PrevalentBackground

• Near 40% reviews in Amazon are fake[1]

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• Yelp hide suspicious reviews and alert consumers

Page 4: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

3

• Dishonest merchants can easily buy high-quality fakereviews online

Images from https://mopeak.com/buy-android-reviews/http://faculty.cs.tamu.edu/caverlee/pubs/kaghazgaran19cikm.pdf

[1] P. Kaghazgaran, M. Alfifi, and J. Caverlee. 2019. Wide-Ranging Review Manipulation Attacks: Model, Empirical Study, and Countermeasures. In CIKM.

Spamming CampaignBackground

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• Machine-generated fake reviews are very authentic-like[1]

Page 5: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

4

Review Spam Detection• To detect fake reviews, three major types of spam detectors

have been proposed

Text-based Detectors Behavior-based Detectors Graph-based Detectors

Background

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Page 6: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

5

Base Spam Detectors

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

MRF-based detector

SVD-based detector

Dense-block-based detector

Behavior-based detector

• GANG• SpEagle

• fBox

• Fraudar

• Prior

Page 7: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

6

Previous Works vs. Our WorkBackground Highlight

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• Our work:• Dynamic game between spammer and defender• Practical evaluation metric• Evolving spamming strategies• Multiple detectors ensemble

• Previous works:• Static dataset• Accuracy-based evaluation metric• Fixed spamming pattern• Single detector

Page 8: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

8

Turning Reviews into Business Revenues• In Yelp, product’s rating is correlated to its revenue[1]

[1] M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016).

Background Methodology I

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Revenue Estimation& Practical Effect

Highlight

:

Page 9: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

• We run five detectors individually against five attacks

• When detector recalls are high (>0.7), the practical effects arenot reduced

10

Practical Effect is Better than Recall

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

0.0 0.2 0.4 0.6 0.8 1.05eFall

0

1

2

3

4

5

3ra

FtiF

al Eff

eFt

YelSChi

6SEagleFrauGarGA1G3riRrfBRx

0.0 0.2 0.4 0.6 0.8 1.05eFall

0

5

10

15

20

25

PraF

tiFal

Effe

Ft

YelS1YC

6SEagleFrauGarGA1GPriRrfBRx

0.0 0.2 0.4 0.6 0.8 1.0ReFall

0

20

40

60

80

100

PraF

tiFal

Effe

Ft

YelSZiS

6SEagleFrauGarGA1GPriRrfBRx

Page 10: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

11

Spammer’s Practical Goal

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Methodology I

Spammer’s Goal:

• To promote a product, the practical goal of the spammer is tomaximize the PE.

Highlight

Revenue after attacks

SpammingPractical Effect

Revenue before attacks

Spamming strategy weights

:

Page 11: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

12

Defender’s Practical Goal

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Methodology I

Defender’s Goal:

• We combine detector prediction results with the practicaleffect to formulate a cost-sensitive loss

• The defender needs to minimize the practical effect

Highlight

The cost of false negatives

The prediction results of detectorsDetector weights

Page 12: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

13

A Minimax-Game Formulation

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Methodology I Methodology II

• The objective function is not differentiable

• Our solution: multi-agent non-cooperative reinforcement learning and SGD optimization

Highlight

Minimax Game Objective:

Page 13: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

16

Train a Robust Detector -

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Methodology I Methodology IIHighlight

Nash-Detect

Page 14: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

17

Base Spamming Strategies

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

• IncBP: add reviews with minimum suspiciousness based onbelief propagation on MRF

• IncDS: add reviews with minimum densities on graphcomposed of accounts, reviews, and products

• IncPR: add reviews with minimum prior suspicious scorescomputed by behavior features

• Random: randomly add reviews

• Singleton: add reviews with new accounts

Page 15: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

• Dataset statistics and spamming attack settings

18

Experimental Settings

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

Dataset # Accounts # Products # Reviews # Controlledelite accounts

# Targetproducts

# Postedfake reviews

YelpChi 38063 201 67395 100 30 450

YelpNYC 160225 923 359052 400 120 1800

YelpZip 260277 5044 608598 700 600 9000

• The spammer controls elite and new accounts

• The defender removes top k suspicious reviews

Page 16: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

19

Fixed Detector’s Vulnerability

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

InFB3 InFDS InF35 5DndRP SingletRnSSDPPing StrDtegy

0

1

2

3

4

5

3rDF

tiFDl

EIIe

Ft

FrDudDr

• For a fixed detector (Fraudar), the spammer can switch to thespamming strategy with the max practical effect (IncDS)

Page 17: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

20

Nash-Detect Training Process

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

0 10 20 30 40 50ESisRde

0.0

0.1

0.2

0.3

0.4

AttD

cN 0

ixtr

ue 3

DrDP

eter

YelSChiIncB3IncDSInc35

5DndRPSingletRn

0 10 20 30 40 50ESisRde

YelS1YC

0 10 20 30 40 50ESisRde

YelSZiS

• Singleton attack is less effective than other four attacks.

Page 18: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

21

Nash-Detect Training Process

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

0 10 20 30 40 50ESisoGe

0.0

0.5

1.0

1.5

2.0

2.5

Det

eFto

r IP

Sort

DnFe

YelSChiSSEDgleFrDuGDrGA1G

3riorIBox

0 10 20 30 40 50ESisoGe

YelS1YC

0 10 20 30 40 50ESisoGe

YelSZiS

• Nash-Detect can find the optimal detector importance smoothly

Page 19: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

0 10 20 30 40 50ESisoGe

3.25

3.50

3.75

4.00

4.25

4.50

4.75

5.00

3rDF

tiFDl

Effe

Ft

YelSChi

fBox6SEDgleFrDuGDr

3riorGA1G1Dsh-DeteFt

0 10 20 30 40 50ESisoGe

21

22

23

24

25

26

27

28YelS1YC

0 10 20 30 40 50ESisoGe

95

100

105

110YelSZiS

22

Nash-Detect Training Process

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

• The practical effect of detectors configured by Nash-Detect arealways less than the worst-case performances

Page 20: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

23

Nash-Detect Performance in Deployment

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

Background Challenges Methodology I Methodology II Experiments

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC

EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr

72.5

80.5

88.5

YelSZiS

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC

EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr

72.5

80.5

88.5

YelSZiS

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC

EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr

72.5

80.5

88.5

YelSZiS

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC

EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr

72.5

80.5

88.5

YelSZiS

Page 21: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

24

Key TakeawaysConclusion

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• New metric

• New spamming strategies

• New adversarial training algorithm

Background Challenges Methodology I Methodology II Experiments

Page 22: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

25

Future Works

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

• Investigate the attack and defenses of deep learning spamdetection methods

• Apply the Nash-Detect framework on other review systems andapplications

• Develop advanced attack generation techniques aware of thestates of review system

ConclusionBackground Challenges Methodology I Methodology II Experiments

Page 23: YingtongDou(UIC) GuixiangMa(IntelLabs) PhilipS.Yu(UIC ...sxie/paper/kdd20slides.pdf[1]M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016)

Robust Spammer Detection by Nash Reinforcement Learning

Yingtong Dou (UIC) Guixiang Ma (Intel Labs)

ACM SIGKDD’ 20, August 23-27th, Virtual Event, CA, USA

[email protected]

Paper: http://arxiv.org/abs/2006.06069Slides: http://ytongdou.com/files/kdd20slides.pdfCode: https://github.com/YingtongDou/Nash-Detect

Sihong Xie (Lehigh)Philip S. Yu (UIC)