yingtongdou(uic) guixiangma(intellabs) philips.yu(uic ...sxie/paper/kdd20slides.pdf[1]m. luca. 2016....

Robust Spammer Detection by Nash Reinforcement Learning

Yingtong Dou (UIC) Guixiang Ma (Intel Labs)

ACM SIGKDD’ 20, August 23-27th, Virtual Event, CA, USA

[email protected]

Paper: http://arxiv.org/abs/2006.06069Slides: http://ytongdou.com/files/kdd20slides.pdfCode: https://github.com/YingtongDou/Nash-Detect

Sihong Xie (Lehigh)Philip S. Yu (UIC)

http://arxiv.org/abs/2006.06069

http://ytongdou.com/files/kdd20slides.pdf

https://github.com/YingtongDou/Nash-Detect

1

Outline• Background: review spam and spamming campaign

• Highlight: previous works vs. our works

• Methodology I: practical goals of spammers and defenders

• Methodology II: robust training of spam detectors (Nash-Detect)

• Experiments: the training and deployment performance of Nash-Detect

• Conclusion & Future Works

Robust Spammer Detection by Nash Reinforcement Learning, KDD 2020

2

Images from https://upserve.com/restaurant-insider/five-key-reasons-shouldnt-buy-yelp-reviews/http://greyenlightenment.com/detecting-fake-amazon-reviews/[1] J. Swearingen. 2017. Amazon Is Filled With Sketchy

Reviews. Here’s How to Spot Them. https://slct.al/2TBXDpT

Fake Reviews are PrevalentBackground

• Near 40% reviews in Amazon are fake[1]


• Yelp hide suspicious reviews and alert consumers

https://upserve.com/restaurant-insider/five-key-reasons-shouldnt-buy-yelp-reviews/

http://greyenlightenment.com/detecting-fake-amazon-reviews/

https://slct.al/2TBXDpT

3

• Dishonest merchants can easily buy high-quality fakereviews online

Images from https://mopeak.com/buy-android-reviews/http://faculty.cs.tamu.edu/caverlee/pubs/kaghazgaran19cikm.pdf

[1] P. Kaghazgaran, M. Alfifi, and J. Caverlee. 2019. Wide-Ranging Review Manipulation Attacks: Model, Empirical Study, and Countermeasures. In CIKM.

Spamming CampaignBackground


• Machine-generated fake reviews are very authentic-like[1]

https://mopeak.com/buy-android-reviews/

http://faculty.cs.tamu.edu/caverlee/pubs/kaghazgaran19cikm.pdf

4

Review Spam Detection• To detect fake reviews, three major types of spam detectors

have been proposed

Text-based Detectors Behavior-based Detectors Graph-based Detectors

Background


5

Base Spam Detectors


Background Challenges Methodology I Methodology II Experiments

MRF-based detector

SVD-based detector

Dense-block-based detector

Behavior-based detector

• GANG• SpEagle

• fBox

• Fraudar

• Prior

6

Previous Works vs. Our WorkBackground Highlight


• Our work:• Dynamic game between spammer and defender• Practical evaluation metric• Evolving spamming strategies• Multiple detectors ensemble

• Previous works:• Static dataset• Accuracy-based evaluation metric• Fixed spamming pattern• Single detector

8

Turning Reviews into Business Revenues• In Yelp, product’s rating is correlated to its revenue[1]

[1] M. Luca. 2016. Reviews, reputation, and revenue: The case of Yelp. com. HBS Working Paper (2016).

Background Methodology I


Revenue Estimation& Practical Effect

Highlight

:

• We run five detectors individually against five attacks

• When detector recalls are high (>0.7), the practical effects arenot reduced

10

Practical Effect is Better than Recall



0.0 0.2 0.4 0.6 0.8 1.05eFall

0

1

2

3

4

5

3ra

FtiF

al Eff

eFt

YelSChi

6SEagleFrauGarGA1G3riRrfBRx

0.0 0.2 0.4 0.6 0.8 1.05eFall

0

5

10

15

20

25

PraF

tiFal

Effe

Ft

YelS1YC

6SEagleFrauGarGA1GPriRrfBRx

0.0 0.2 0.4 0.6 0.8 1.0ReFall

0

20

40

60

80

100

PraF

tiFal

Effe

Ft

YelSZiS

6SEagleFrauGarGA1GPriRrfBRx

11

Spammer’s Practical Goal



Spammer’s Goal:

• To promote a product, the practical goal of the spammer is tomaximize the PE.

Highlight

Revenue after attacks

SpammingPractical Effect

Revenue before attacks

Spamming strategy weights

:

12

Defender’s Practical Goal



Defender’s Goal:

• We combine detector prediction results with the practicaleffect to formulate a cost-sensitive loss

• The defender needs to minimize the practical effect

Highlight

The cost of false negatives

The prediction results of detectorsDetector weights

13

A Minimax-Game Formulation


Background Methodology I Methodology II

• The objective function is not differentiable

• Our solution: multi-agent non-cooperative reinforcement learning and SGD optimization

Highlight

Minimax Game Objective:

16

Train a Robust Detector -


Background Methodology I Methodology IIHighlight

Nash-Detect

17

Base Spamming Strategies



• IncBP: add reviews with minimum suspiciousness based onbelief propagation on MRF

• IncDS: add reviews with minimum densities on graphcomposed of accounts, reviews, and products

• IncPR: add reviews with minimum prior suspicious scorescomputed by behavior features

• Random: randomly add reviews

• Singleton: add reviews with new accounts

• Dataset statistics and spamming attack settings

18

Experimental Settings



Dataset # Accounts # Products # Reviews # Controlledelite accounts

# Targetproducts

# Postedfake reviews

YelpChi 38063 201 67395 100 30 450

YelpNYC 160225 923 359052 400 120 1800

YelpZip 260277 5044 608598 700 600 9000

• The spammer controls elite and new accounts

• The defender removes top k suspicious reviews

19

Fixed Detector’s Vulnerability



InFB3 InFDS InF35 5DndRP SingletRnSSDPPing StrDtegy

0

1

2

3

4

5

3rDF

tiFDl

EIIe

Ft

FrDudDr

• For a fixed detector (Fraudar), the spammer can switch to thespamming strategy with the max practical effect (IncDS)

20

Nash-Detect Training Process



0 10 20 30 40 50ESisRde

0.0

0.1

0.2

0.3

0.4

AttD

cN 0

ixtr

ue 3

DrDP

eter

YelSChiIncB3IncDSInc35

5DndRPSingletRn

0 10 20 30 40 50ESisRde

YelS1YC

0 10 20 30 40 50ESisRde

YelSZiS

• Singleton attack is less effective than other four attacks.

21




0 10 20 30 40 50ESisoGe

0.0

0.5

1.0

1.5

2.0

2.5

Det

eFto

r IP

Sort

DnFe

YelSChiSSEDgleFrDuGDrGA1G

3riorIBox

0 10 20 30 40 50ESisoGe

YelS1YC

0 10 20 30 40 50ESisoGe

YelSZiS

• Nash-Detect can find the optimal detector importance smoothly

0 10 20 30 40 50ESisoGe

3.25

3.50

3.75

4.00

4.25

4.50

4.75

5.00

3rDF

tiFDl

Effe

Ft

YelSChi

fBox6SEDgleFrDuGDr

3riorGA1G1Dsh-DeteFt

0 10 20 30 40 50ESisoGe

21

22

23

24

25

26

27

28YelS1YC

0 10 20 30 40 50ESisoGe

95

100

105

110YelSZiS

22




• The practical effect of detectors configured by Nash-Detect arealways less than the worst-case performances

23

Nash-Detect Performance in Deployment



3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC

EquDl-WeighWs 1Dsh-DeWeFW GA1G 3rior 6SEDgle fBox FrDuGDr

72.5

80.5

88.5

YelSZiS

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC


72.5

80.5

88.5

YelSZiS

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC


72.5

80.5

88.5

YelSZiS

3.0

3.4

3.8

4.2

4.6

YelSChi

21.0

21.8

22.6

23.4

24.2

YelS1YC


72.5

80.5

88.5

YelSZiS

24

Key TakeawaysConclusion


• New metric

• New spamming strategies

• New adversarial training algorithm


25

Future Works


• Investigate the attack and defenses of deep learning spamdetection methods

• Apply the Nash-Detect framework on other review systems andapplications

• Develop advanced attack generation techniques aware of thestates of review system

ConclusionBackground Challenges Methodology I Methodology II Experiments

Robust Spammer Detection by Nash Reinforcement Learning

Yingtong Dou (UIC) Guixiang Ma (Intel Labs)

ACM SIGKDD’ 20, August 23-27th, Virtual Event, CA, USA

[email protected]

Paper: http://arxiv.org/abs/2006.06069Slides: http://ytongdou.com/files/kdd20slides.pdfCode: https://github.com/YingtongDou/Nash-Detect

Sihong Xie (Lehigh)Philip S. Yu (UIC)

http://arxiv.org/abs/2006.06069

http://ytongdou.com/files/kdd20slides.pdf

https://github.com/YingtongDou/Nash-Detect

yingtongdou(uic) guixiangma(intellabs) philips.yu(uic ...sxie/paper/kdd20slides.pdf[1]m. luca. 2016....

Documents