detection of spam tipping behaviour on foursquare

21
Anupama Aggarwal , Prof P. Kumaraguru “PK” , Prof J. Almeida* 1 Detection of Spam Tipping Behaviour on Foursquare Indraprastha Institute of Information Technology (IIIT-Delhi, India) * Universidade Federal de Minas Gerais (UFMG, Brazil)

Upload: precog

Post on 17-Dec-2014

132 views

Category:

Technology


0 download

DESCRIPTION

In this paper, we present what, to our knowledge, is the first effort to identify and analyze different patterns of tip spamming activity in Foursquare, with the goal of developing automatic tools to detect users who post spam tips - tip spammers. A manual investigation of a real dataset collected from Foursquare led us to identify four categories of spamming behavior, viz. Advertising/Spam, Self-promotion, Abusive and Malicious. We then applied machine learning techniques, jointly with a selected set of user, social and tip's content features associated with each user, to develop automatic detection tools. Our experimental results indicate that we are able to not only correctly distinguish legitimate users from tip spammers with high accuracy (89.76%) but also correctly identify a large fraction (at least 78.88%) of spammers in each identified category.

TRANSCRIPT

Page 1: Detection of Spam Tipping Behaviour on Foursquare

Anupama Aggarwal¶, Prof P. Kumaraguru “PK”¶ , Prof J. Almeida*

1

Detection of Spam Tipping Behaviour on

Foursquare

¶ Indraprastha Institute of Information Technology (IIIT-Delhi, India)* Universidade Federal de Minas Gerais (UFMG, Brazil)

Page 2: Detection of Spam Tipping Behaviour on Foursquare

2

‣ Location Based Social Network

‣ 33 Million Users *

‣ 3.5 Billion checkins *

‣ 31% of mobile social media users use Foursquare *

* As of January 2013

Foursquare 101

Page 3: Detection of Spam Tipping Behaviour on Foursquare

Foursquare 101Your LastCheckin

Venue

Fri

ends

Act

ivit

y

Tip : Suggested Activity for a Venue

Friends Suggestions

Venue Suggestions

Tip can be Liked or Saved

LocationSharing

OSN

Page 4: Detection of Spam Tipping Behaviour on Foursquare

4

Spam Tips

‣ Tips unrelated to Venue

Advertising / Marketing

Scam / Phishing

Page 5: Detection of Spam Tipping Behaviour on Foursquare

5

Spam according to

Foursquare ToS‣ Tips with links to websites selling software, realtor contact

info, a listing for your business, or other promotion

‣ Tips with inappropriate language or negativity directed at another person

‣ Unauthorized or unsolicited advertising, junk

Page 6: Detection of Spam Tipping Behaviour on Foursquare

6

‣ Characterizing irregular user behaviour

‣ We observed different categories of spam users

‣ We characterize features distinguishing these spam users

‣ Automatic detection of spammers

‣ Distinguish between spam and legitimate Foursquare users

‣ Cluster spam users into different categories according to their behaviour

Contributions

Page 7: Detection of Spam Tipping Behaviour on Foursquare

7

Data Crawling

2,400,594 tips

613,298 users

Page 8: Detection of Spam Tipping Behaviour on Foursquare

Observed Categories of Spam Users

‣ Marketing : These users post tips to promote and advertise a specific product/ brand / venue / external URL

‣ Malicious : Such Foursquare users post external  URLs in Tips which direct to spam / phishing / malware websites

‣ Abusive / Derogatory: These users try to deface or bad-mouth another person

‣ Self Promotion: These users try to draw attention to themselves

8

Page 9: Detection of Spam Tipping Behaviour on Foursquare

9

Ground Truth DataAnnotation Portal

2,000 Legitimate users

1,900 Spammers

Page 10: Detection of Spam Tipping Behaviour on Foursquare

10

‣ User Attributes

‣ Properties of the Foursquare user profile and his checkins

‣ Social Attributes

‣ Friends network of the Foursquare user under inspection

‣ Content Attributes

‣ Details about Tips posted by the Foursquare user

Features used todetect Spammers

Page 11: Detection of Spam Tipping Behaviour on Foursquare

11

Features usedCategory χ2 rank Feature

UserAttributes

1 Number of Tips

UserAttributes

3 Ratio of Check-ins and Tips

UserAttributes

4 Number of Check-insUser

Attributes 5 Number of BadgesUser

Attributes11 Number of Mayorships

UserAttributes

12 Ratio of Check-ins and Badges

UserAttributes

15 Number of Photos posted

Social Attributes

6 Number of Friends

ContentAttributes

2 Similarity score of Tips

ContentAttributes

7 Number of URLs posted

ContentAttributes

8 Average number of words in TipsContent

Attributes 9 Average number of characters in TipsContent

Attributes10 Ratio of number of likes and number of Tips

ContentAttributes

13 Average number of spam words in Tips

ContentAttributes

14 Average number of phone-numbers posted in Tips

Page 12: Detection of Spam Tipping Behaviour on Foursquare

12

‣ Spammers post same/similar Tips on multiple venues

‣ A large fraction of spam Tips contain URLs

‣ Spam Tips may also have phone numbers

‣ Legitimate users have more Friends

‣ Spammers have very few Friends but large number of Tips

Few Observations

Page 13: Detection of Spam Tipping Behaviour on Foursquare

Relation b/w Tips and Checkins

Check-ins

TipsIrregular User Behaviour

Page 14: Detection of Spam Tipping Behaviour on Foursquare

14

Tips Distribution

Legitimate users Spammers

Page 15: Detection of Spam Tipping Behaviour on Foursquare

15

Classification Results

ClassificationAlgorithm

Precision(Spam)

Precision(Safe)

Recall(Spam)

Recall(Safe)

Accuracy

KNN 83.2% 86.6% 86.3% 83.5% 84.89%

DecisionTree

88.1% 89.2% 88.3% 85.8% 89.53%

RandomForest

89.3% 90.2% 88.3% 90.3% 89.76%

Page 16: Detection of Spam Tipping Behaviour on Foursquare

16

‣ Expectation-Maximization (EM) clustering

‣ Spammers Categories -

‣ Advertising / Marketing

‣ Self Promotion

‣ Abusive

‣ Malicious

Detection of Spam Classes

Page 17: Detection of Spam Tipping Behaviour on Foursquare

17

‣ Clustering Accuracy for spammer categories -

Detection of Spam Classes

Advertising 88.23%

Self-Promotion 87.23%

Abusive 78.88%

Malicious 0%

Page 18: Detection of Spam Tipping Behaviour on Foursquare

18

‣ Analyzed spammers behaviour on Foursquare

‣ We obtained an accuracy of 89.76% with Random Forest classifier to distinguish spammers from legitimate users

‣ We classified the spammers into four broad categories

‣ We were able to to detect users belonging to Advertising, Self-promotion and Abusive categories with an accuracy of 88.23%, 87.23% and 78.88%

Conclusion

Page 19: Detection of Spam Tipping Behaviour on Foursquare

19

‣ Refine our methodology by use of other classification algorithms

‣ Use multiclass classification to detect users in any of the spam categories

‣ Correlation of content and the URLs posted by different users can help us in identifying several spam campaigns on Foursquare

Future Work

Page 20: Detection of Spam Tipping Behaviour on Foursquare

20

Thank You!

Questions ?

Page 21: Detection of Spam Tipping Behaviour on Foursquare

21

For any further information, please write [email protected]

precog.iiitd.edu.in