detecting fake engagement on instagram

45
Detecting Fake Engagement on Instagram Indira Sen linkedin/in/indira-sen-8 a6068140 @drealcharbar fb.com/indira.sen.31 Dr. Ponnurangam Kumaraguru (chair) 1

Upload: precog

Post on 22-Jan-2018

184 views

Category:

Engineering


0 download

TRANSCRIPT

Page 1: Detecting Fake Engagement on Instagram

Detecting Fake Engagement on Instagram

Indira Sen

linkedin/in/indira-sen-8a6068140

@drealcharbar fb.com/indira.sen.31

Dr. Ponnurangam Kumaraguru(chair)

1

Page 2: Detecting Fake Engagement on Instagram

Thesis Committee

- Dr. Anwitaman Datta, NTU Singapore

- Mr. Nitendra Rajput, InfoEdge

- Dr. Ponnurangam Kumaraguru, IIIT Delhi (Chair)

2

Page 3: Detecting Fake Engagement on Instagram

Likes on Instagram

3,363 likes

3

Page 4: Detecting Fake Engagement on Instagram

Likes on Instagram

1,008 likes

4

Page 5: Detecting Fake Engagement on Instagram

Why is Engagement Important on Instagram?

5

Page 6: Detecting Fake Engagement on Instagram

Why Fake Likes?

- ‘Influencers’ compensated on engagement: likes and comments

- Incentive to artificially inflate engagement metrics by purchasing likes, like markets or like back networks

- Inflated like count fool potential brand or advertisers into hiring ‘unworthy’ Influencers

6

Page 7: Detecting Fake Engagement on Instagram

Motivation

7

- Influencer Marketing - $1B industry- Fake influencers landed deals over

$500

Page 8: Detecting Fake Engagement on Instagram

- How do we automatically detect fraudulent likes on Instagram?

Core Thesis Question

Organic Likes- Likers who engage with content- Genuine reach

Inorganic Likes- Likers bought from marketplaces- Artificial reach

- Understanding properties of genuine liking behaviour B : {b1, b2, …, bn}- Reducing the effect of likes which do not match B

8

Page 9: Detecting Fake Engagement on Instagram

Thesis Outline

- Research Aim- Data Collection- Analysis of Fake Likes- Machine Learning Classifier to Detect Fake Likes- Estimating Reach of Users- Conclusion

9

Page 10: Detecting Fake Engagement on Instagram

What is a Like Instance?

- Given a poster S whose post p has been liked by liker L, we define a like instance as the tuple (L, p, S)

10

Page 11: Detecting Fake Engagement on Instagram

Research Aim

- Find out the features of liker L, post p and S, to determine the probability of liker L genuinely liking that particular post p.

- Identify true reach of poster by determining fake likes received on the posted content.

11

Page 12: Detecting Fake Engagement on Instagram

Possible Reasons for Genuine Liking

Homepage: followees’ posts

Explore:Instagram’s

Recommendations

Likes of followees

12

Page 13: Detecting Fake Engagement on Instagram

Possible Reasons for Genuine Liking

Based on photos you liked

Based on people you follow

Similar to accounts you interact with

Explore

13

Page 14: Detecting Fake Engagement on Instagram

Possible Reasons For Genuine Liking

- Poster is a followee - Poster is a followee of a followee

- Topical interests in common

14

Page 15: Detecting Fake Engagement on Instagram

How to get Fake Likes

- Marketplaces

- Like Back collusion networks

- Link Farming hashtags

- Bots15

Page 16: Detecting Fake Engagement on Instagram

Architecture Diagram1) Liker meta and last 18 posts2) Poster meta and last 18 posts3) Post meta

Fake Likes

Other Likes

Training Data

Machine Learning

Model

Random unknown Likes

Fake

Not Fake

Features

Features

16

1 - α

α

Page 17: Detecting Fake Engagement on Instagram

Data Collection: Fake Likes

Purchased Fake Likes

Fake Likes 1: Likes given by Honeypot victims

Likes on videos with views = 0

Honeypot

Fake Likes 2

victim?

Instagram Featured users

Snowball Sample to

1M

Random sample of

500Honeypot Other Likesnot

victim?

17

Instagram Featured users

Snowball Sample to

1M

Random sample of

500Honeypot Other Likesnot

victim?

Data Collection: Fake Likes

Purchased Fake Likes

Fake Likes 1: Likes given by Honeypot victims

Likes on videos with views = 0

Honeypot

Fake Likes 2

victim?

17

Page 18: Detecting Fake Engagement on Instagram

Data Collection: Fake Likes

- Honeypots to trap fake likers bought through a service- If user falls for honeypot then we monitor their liking

behaviour

Honeypot

18

Page 19: Detecting Fake Engagement on Instagram

Instagram Featured users

Snowball Sample to

1M

Random sample of

500Honeypot Other Likesnot

victim?

Data Collection: Fake Likes

Purchased Fake Likes

Fake Likes 1: Likes given by Honeypot victims

Likes on videos with views = 0

Honeypot

Fake Likes 2

victim?

19

Page 20: Detecting Fake Engagement on Instagram

Data Collection: Other Likes

Purchased Fake Likers

Fake Likes 1: Likes given by Honeypot victims

Likes on videos with views = 0

Honeypot

Fake Likes 2

victim?

Instagram Featured users

Snowball Sample to

1M

Random sample of

500Honeypot Other Likesnot

victim?

20

Page 21: Detecting Fake Engagement on Instagram

Data Collection: Other Likes

- Randomly sample 500 users from 1M users who are not honeypot victims

#Likes #Posts #Likers #Posters

Fake 10,417 8,408 500 7,715

Other 11,810 11,644 500 7,631

21

Page 22: Detecting Fake Engagement on Instagram

Thesis Outline

- Research Aim- Data Collection- Analysis of Fake Likes- Machine Learning Classifier to Detect Fake Likes- Estimating Reach of Users- Conclusion

22

Page 23: Detecting Fake Engagement on Instagram

Understanding Fake Likes

- Hypotheses indicative of fake liking behaviour

- Validate with 2 sample KS test

- Network effect:- Liker is follower of poster- Liker is follower of follower of poster

23

Page 24: Detecting Fake Engagement on Instagram

Liker is Follower of Poster

- Green edges: liker relationship

- Red edges: liker - follower relationship

- Other likes have a higher proportion of follower-likers

24

Other Likes

Fake Likes

Page 25: Detecting Fake Engagement on Instagram

Network Effects

25

- 90% fake like instances have only .25 of followee likes

90%

56%

Page 26: Detecting Fake Engagement on Instagram

Interest Overlap

- A user will like a post if she shares topical interests with the post

- Affinity: lower the affinity, the higher the overlap

26

Page 27: Detecting Fake Engagement on Instagram

Extracting Topics

- Bio, post text and post image- Wikification and Densecap for images

27

Page 28: Detecting Fake Engagement on Instagram

Extracting Topics

- Bio, post text and post image- Wikification and Densecap for images

28

Image topics

Post caption topics

Page 29: Detecting Fake Engagement on Instagram

Interest Overlap

- A user will like a post if she shares topical interests with the post

- Affinity

- non-commutative29

Page 30: Detecting Fake Engagement on Instagram

Affinity

- Affinity outperforms Jaccard distance in terms of discernibility

- post image topics strong indicators of genuine liking

30

Page 31: Detecting Fake Engagement on Instagram

- Our metric is able to capture semantic relationship between entities compared to other traditional distance metrics

- 90% of other likes have an average affinity of 0.5 - 90% of fake likes have an average affinity of 0.74

0.740.5

31

Page 32: Detecting Fake Engagement on Instagram

Other Features

- Celebrities tend to get more likes (engagement) - Genuine likers will keep coming back - repeated likers- Link Farming hashtags: #like4like, #l4l, #like2follow- Topical hashtags- Posting activity of liker (Badri et al, CIKM’16) and poster- Profile picture of liker: egghead profiles (cheap to

create)

32

Page 33: Detecting Fake Engagement on Instagram

Automatic Detection of Fake Likes

- Using features described and a set of ML classifiers

- Fake likes : Other likes ratio → 1:2

- SVM RBF kernel gives best performance

33

Page 34: Detecting Fake Engagement on Instagram

Classification Model

- Performance

- Manually look at 100 false negatives and find that 70 of them had high topical overlap

- Liker interest set was small: affinity metric limitation

Precision Recall F1-score

0 0.93 0.96 0.945

1 0.895 0.825 0.86

total 0.92 0.925 0.92

34

Page 35: Detecting Fake Engagement on Instagram

In the Wild Experiment

- random 1,34,669 like instances

- Categorize posts into : food, fashion, outdoors, merchandise, people, gadgets, pets, captioned

- We find 8,557 fake likes

- Manually analyze 100 of these and find 78 to be fake35

Page 36: Detecting Fake Engagement on Instagram

Thesis Outline

- Research Aim- Data Collection- Analysis of Fake Likes- Machine Learning Classifier to Detect Fake Likes- Estimating Reach of Users- Conclusion

36

Page 37: Detecting Fake Engagement on Instagram

- Enable advertisers to make better decisions- Reduce the effect of fake likes a poster may have

received- Measure Deviation in reach

Reach Estimation

37

Page 38: Detecting Fake Engagement on Instagram

Who receives fake likes?

- Users posting about merchandise, outdoors (including travel posts) and people (posts containing faces) have highest deviation from the projected reach.

38

Page 39: Detecting Fake Engagement on Instagram

Who receives fake likes?

39

merchandise, outdoors (including travel posts) and people

Most posters do not have high deviation while some users have very high deviation

Page 40: Detecting Fake Engagement on Instagram

Do Popular Users have more Fake Likes?

- No, users with lower follower counts who maybe trying to gain a following higher deviation

40

‘Micro Influencers’ have higher deviation

Page 41: Detecting Fake Engagement on Instagram

Conclusion

- Automated method to detect fake like instances

- Performs well to identify unseen fake likes on Instagram.

- Find true reach of a user

- Helps advertisers and brands identify users with genuine, meaningful reach

41

Page 42: Detecting Fake Engagement on Instagram

Challenges, Limitations and Future Work

- Availability of labeled data, approximations using honeypot

- Data collection constraints, integrate network features

- Improve affinity, improve precision(dynamic features)

- Fine grained topical recommendations for brands and advertisers 42

Page 43: Detecting Fake Engagement on Instagram

Acknowledgement

- Anupama Aggarwal, PhD Scholar, IIIT Delhi- Committee members- Srishti Gupta, Divyansh Agarwal, Neha Jawalkar, Sonu

Gupta, Kushagra Bhargava- Siddharth Singh, Shiven Mian- Members of Precog- Family and friends

43

Page 45: Detecting Fake Engagement on Instagram

Thanks!Any questions?You can find me at:

[email protected]

45

[email protected]