case study on yelp spamming

32
ase Study On Yelp

Upload: harshitha-chidananda-murthy

Post on 19-Feb-2017

161 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Case study on Yelp spamming

Case Study On Yelp

Page 2: Case study on Yelp spamming

On the temporal dynamics of opinion

spamming: case studies on

Santosh K C, Arjun MukherjeeWWW 2016

-Harshitha Chidananda

Page 3: Case study on Yelp spamming

Introduction• Online business boom• Online reviews are important!

• Enhance/defame products• Influence the buyer

• Online reviews dominance• Spammers

• The problem of opinion spam has been widespread and has attracted a lot of research attention

Page 4: Case study on Yelp spamming

Problem

Service FraudCredit-card 0.2%Fake reviews 20%

• Deliberate attempts • Promote/demote • Target products/services

• Fake reviews• Fake profiles

Page 5: Case study on Yelp spamming

Related Work• Notable works include:

• Detecting individual spammers• Group spammers• Detecting rating behaviors• Unexpected association rules• Linguistic approaches• Semi-supervised methods.

Page 6: Case study on Yelp spamming

Challenges• Temporal dynamics not clearly understood

How does spamming operate on a daily basis?

What are the dominant spamming policies?

How do the spam injection rates vary upon variation of popularity of entities?

What factors are temporally correlated with opinion spamming?How effective can we predict the long term future of popularity and average rating of a entity in the presence of deception?

How accurately can future deception be predicted?

Are there specific spamming policies that spammers employ?

What kind of changes happen with respect to the dynamics to the truthful ratings on entities. ?

How do buffered spamming operate for entities that need spamming to retain threshold popularity and reduced spamming for entities making better success?

Page 7: Case study on Yelp spamming
Page 8: Case study on Yelp spamming

Review on Yelp

Page 9: Case study on Yelp spamming

Contributions• Spamming Policies • Causal Modeling of Deceptive Ratings

• Predicting Deceptive Ratings • Predicting Truthful Popularity and Rating

Analyze the time-series of fake

ratings

Similar pattern observed

Indicate presence of spamming policies used by spammers.

Process

Page 10: Case study on Yelp spamming

Overview• Reveals two interesting spamming trends

• Buffered spamming• Reduced spamming

2 types of restaurants:More successful and consequently in lesser need of spamming.

Need spamming to retain threshold popularity

Page 11: Case study on Yelp spamming

Dataset• Truthful and fake (spam) reviews• 70 popular restaurants• Chicago• 5 year time span

Date of first review

5 years from the start

Page 12: Case study on Yelp spamming

Yelp as a reference Dataset• Implements review filtering• Maintained by dedicated anti-fraud team• Research on Yelp’s filtering methods shows reliability

Page 13: Case study on Yelp spamming

Studies revealed that majority (~75%) of the spam is focused on promotion as opposed to demotion. Hence paper focuses on promotion spamming

Page 14: Case study on Yelp spamming

Dynamics of Spamming Policies• 10 Modalities• 3 Spamming policies

Page 15: Case study on Yelp spamming

Spamming PoliciesEarly

With for truthful reviews for the initial period of 5 months. Then start spamming

MidDon’t exhibit spam injection until the 14th month

EarlyStart promotion spamming only after the 30th month

Page 16: Case study on Yelp spamming
Page 17: Case study on Yelp spamming

Casual Modelling of Deceptive Ratings

• 3 dominant trends of spam injection• Early• Mid• Late

• Characterize based on• Truthful like ratings• Truthful dislike ratings• Truthful review count

• Time-series comparison of truthful and deceptive reviews• Buffered Spamming• Reduced Spamming

Rating dynamics of truthful reviews can potentially determine the future deception rates for each restaurant

Page 18: Case study on Yelp spamming

Buffered SpammingHow do restaurants deal with their weaning popularity and growth of dislike ratings?

•Proactively inject deceptive reviews?

•Deceptive like/average ratings increases with decrease in truthful like/average rating

Page 19: Case study on Yelp spamming

Buffered Spamming• A buffer action at work which adjusts the spamming rate by

injecting deceptive like reviews

Truthful ratings Deceptive like ratings

Page 20: Case study on Yelp spamming

Reduced Spamming• Case: When restaurant maintain decent popularity and rating

• Is there a reduction in the spam injection rate, as they have a better standing already?

• Show a pattern where spam injection rates are reduced when the truthful reviews are in favor.

Truthful ratings Deceptive like ratings

Page 21: Case study on Yelp spamming

Predicting Dynamics of Deceptive Ratings• Truthful ratings are harbingers• Vector Auto Regression (VAR) model used to predict next week’s deceptive

like rating • Lags 1 week• Lags 2 week

• Prediction for • 10• 20 • 30 weeks window sizes

• Spamming Policies• Buffered• Reduced

Page 22: Case study on Yelp spamming

• Early spamming is harder to predict

• Buffered spamming error rate is high since it uses complicated algorithm

Page 23: Case study on Yelp spamming

• Early spamming is harder to predict

Imminent Truthful Popularity

Page 24: Case study on Yelp spamming

Predicting Truthful Popularity and Rating • Do deceptive reviews affect a restaurant’s popularity and average

ratings?• Training

• 10 weeks• Prediction

• 6 months• Only truthful reviews used• 4 feature families used

Page 25: Case study on Yelp spamming
Page 26: Case study on Yelp spamming

Prediction Results - Popularity • Popularity refers to the total number of reviews in a time period

Mean Absolute Error

Page 27: Case study on Yelp spamming

Prediction Results - Rating • Model performs better as features Opinion Lexicon, N-Grams and

Aspect Sentiment Lexicon are added in both popularity and rating prediction

• Natural language signals are helpful

Mean Absolute Error

Page 28: Case study on Yelp spamming

How reliable are Yelp’s filtered reviews?• Significant increase in mean average error(mae) upon adding

review filtered by yelp across all policies• Reviews filtered by yelp

• Imparted noise • Harmful to popularity/rating prediction• Not representative of truthful experiences

• Yelp’s filter although may not be perfect is reasonably reliable

Page 29: Case study on Yelp spamming

Strengths and WeaknessesStrength• New approach• Good results• Used large set of reviews• Good features selected

Weakness• Lack of term explanation• Graph less explained

Page 30: Case study on Yelp spamming

Open Issues• Applicable to other popular review websites?• Demographic impact?• Deeper into NLP

Page 31: Case study on Yelp spamming

My thoughts and ConclusionMy thoughts• In depth analysis of temporal

dynamics of opinion spamming

• Used to check the influence of deceptive ratings on true ratings

• Validate Yelp’s filtering process

Conclusion• Time series analysis• Deceptive and true ratings well

correlated• 3 dominant spamming policies

• Early• Mid• Late

• 2 spamming policies• Reduced• Buffered

Page 32: Case study on Yelp spamming

Thank You!Questions?