link farming in twitterpawang/courses/sc14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers...

39
Link Farming in Twitter Pawan Goyal CSE, IITKGP July 31, 2014 Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 1 / 17

Upload: others

Post on 02-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link Farming in Twitter

Pawan Goyal

CSE, IITKGP

July 31, 2014

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 1 / 17

Page 2: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Reference

Saptarshi Ghosh, Bimal Viswanath, Farshad Kooti, Naveen Kumar Sharma,Korlam Gautam, Fabricio Benevenuto, Niloy Ganguly, and Krishna P.Gummadi. 2012. Understanding and Combating Link Farming in the TwitterSocial Network. Proceedings of the 21st International World Wide WebConference (WWW), Lyon, France.

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 2 / 17

Page 3: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link Farming

Link Farming in WebWebsites exchange reciprocal links with other sites to improve ranking bysearch engines

Why Link Farming is an issue?Search engines rank websites / webpages based on graph metrics such asPagerank/HITS

High in-degree helps to get high pagerank

A form of spamHeavily penalized by search engines

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 3 / 17

Page 4: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link Farming

Link Farming in WebWebsites exchange reciprocal links with other sites to improve ranking bysearch engines

Why Link Farming is an issue?Search engines rank websites / webpages based on graph metrics such asPagerank/HITS

High in-degree helps to get high pagerank

A form of spamHeavily penalized by search engines

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 3 / 17

Page 5: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link Farming

Link Farming in WebWebsites exchange reciprocal links with other sites to improve ranking bysearch engines

Why Link Farming is an issue?Search engines rank websites / webpages based on graph metrics such asPagerank/HITS

High in-degree helps to get high pagerank

A form of spamHeavily penalized by search engines

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 3 / 17

Page 6: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link Farming in Twitter

Twitter as a Web within the WebVast amount of information and real-time news

Twitter search becoming more and more common

Search engines rank users by follower-rank, Pagerank to decide whosetweet to return as search results

Ij = µ · ∑∀k 6=j

Ik · M̃j,k +1−µ|S|

High indegree (no. of followers) is seen as a metric of influence

Link Farming in TwitterSpammers follow other users and attempt to get them to follow back

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 4 / 17

Page 7: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link Farming in Twitter

Twitter as a Web within the WebVast amount of information and real-time news

Twitter search becoming more and more common

Search engines rank users by follower-rank, Pagerank to decide whosetweet to return as search results

Ij = µ · ∑∀k 6=j

Ik · M̃j,k +1−µ|S|

High indegree (no. of followers) is seen as a metric of influence

Link Farming in TwitterSpammers follow other users and attempt to get them to follow back

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 4 / 17

Page 8: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link Farming in Twitter

Twitter as a Web within the WebVast amount of information and real-time news

Twitter search becoming more and more common

Search engines rank users by follower-rank, Pagerank to decide whosetweet to return as search results

Ij = µ · ∑∀k 6=j

Ik · M̃j,k +1−µ|S|

High indegree (no. of followers) is seen as a metric of influence

Link Farming in TwitterSpammers follow other users and attempt to get them to follow back

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 4 / 17

Page 9: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link Farming in Twitter

Twitter as a Web within the WebVast amount of information and real-time news

Twitter search becoming more and more common

Search engines rank users by follower-rank, Pagerank to decide whosetweet to return as search results

Ij = µ · ∑∀k 6=j

Ik · M̃j,k +1−µ|S|

High indegree (no. of followers) is seen as a metric of influence

Link Farming in TwitterSpammers follow other users and attempt to get them to follow back

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 4 / 17

Page 10: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farming in Web and Twitter

Motivation is similarHigher indegree will give better ranks in search results

Who engages in link farming?Web - spammers

Twitter - spammers, many legitimate, popular users

Additional factors in Twitter‘Following back’ considered as a social ettiquette

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 5 / 17

Page 11: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farming in Web and Twitter

Motivation is similarHigher indegree will give better ranks in search results

Who engages in link farming?Web - spammers

Twitter - spammers,

many legitimate, popular users

Additional factors in Twitter‘Following back’ considered as a social ettiquette

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 5 / 17

Page 12: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farming in Web and Twitter

Motivation is similarHigher indegree will give better ranks in search results

Who engages in link farming?Web - spammers

Twitter - spammers, many legitimate, popular users

Additional factors in Twitter‘Following back’ considered as a social ettiquette

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 5 / 17

Page 13: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farming in Web and Twitter

Motivation is similarHigher indegree will give better ranks in search results

Who engages in link farming?Web - spammers

Twitter - spammers, many legitimate, popular users

Additional factors in Twitter‘Following back’ considered as a social ettiquette

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 5 / 17

Page 14: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farmers in Twitter

Idea: start with spammersStudy how spammers acquire social links

Large amounts of spam in TwitterSpam-URLs get much higher clickthrough rates than spam-URLs in email

Spammers are successfully acquiring social links and social influence

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 6 / 17

Page 15: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farmers in Twitter

Idea: start with spammersStudy how spammers acquire social links

Large amounts of spam in TwitterSpam-URLs get much higher clickthrough rates than spam-URLs in email

Spammers are successfully acquiring social links and social influence

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 6 / 17

Page 16: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Dataset

Twitter dataset collected at MPI-SWS, GermanyComplete snapshot of Twitter as of August 2009

54 million users, 1.9 billion social links

Identifying spammersAttempt to crawl user’s profile page - if the user is suspended, crawl wouldlead to http://twitter.com/suspended

379,340 accounts suspended during Aug 2009 - Feb 2011

Suspension - either due ot spam-activity or long inactivity

41,352 suspended accounts posted at least one blacklisted URLshortened by bit.ly or tinyurl

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 7 / 17

Page 17: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Dataset

Twitter dataset collected at MPI-SWS, GermanyComplete snapshot of Twitter as of August 2009

54 million users, 1.9 billion social links

Identifying spammersAttempt to crawl user’s profile page - if the user is suspended, crawl wouldlead to http://twitter.com/suspended

379,340 accounts suspended during Aug 2009 - Feb 2011

Suspension - either due ot spam-activity or long inactivity

41,352 suspended accounts posted at least one blacklisted URLshortened by bit.ly or tinyurl

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 7 / 17

Page 18: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Dataset

Twitter dataset collected at MPI-SWS, GermanyComplete snapshot of Twitter as of August 2009

54 million users, 1.9 billion social links

Identifying spammersAttempt to crawl user’s profile page - if the user is suspended, crawl wouldlead to http://twitter.com/suspended

379,340 accounts suspended during Aug 2009 - Feb 2011

Suspension - either due ot spam-activity or long inactivity

41,352 suspended accounts posted at least one blacklisted URLshortened by bit.ly or tinyurl

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 7 / 17

Page 19: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Dataset

Twitter dataset collected at MPI-SWS, GermanyComplete snapshot of Twitter as of August 2009

54 million users, 1.9 billion social links

Identifying spammersAttempt to crawl user’s profile page - if the user is suspended, crawl wouldlead to http://twitter.com/suspended

379,340 accounts suspended during Aug 2009 - Feb 2011

Suspension - either due ot spam-activity or long inactivity

41,352 suspended accounts posted at least one blacklisted URLshortened by bit.ly or tinyurl

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 7 / 17

Page 20: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Dataset

Twitter dataset collected at MPI-SWS, GermanyComplete snapshot of Twitter as of August 2009

54 million users, 1.9 billion social links

Identifying spammersAttempt to crawl user’s profile page - if the user is suspended, crawl wouldlead to http://twitter.com/suspended

379,340 accounts suspended during Aug 2009 - Feb 2011

Suspension - either due ot spam-activity or long inactivity

41,352 suspended accounts posted at least one blacklisted URLshortened by bit.ly or tinyurl

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 7 / 17

Page 21: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Terminology for spammers’ links

Spam-targets: users followed by spammers

Spam-followers: users who follow spammers

Targeted followers: spam-target as well as spam-follower

Non-targeted followers: follow spammers without being targeted

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 8 / 17

Page 22: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Terminology for spammers’ links

Spam-targets: users followed by spammers

Spam-followers: users who follow spammers

Targeted followers: spam-target as well as spam-follower

Non-targeted followers: follow spammers without being targeted

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 8 / 17

Page 23: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farming by spammers

Spammers farm links at large scaleOver 13 million users (27% of total) targeted by 41,352 spammers (0.08% oftotal)

1.3 million spam-followers82% are targeted→ spammers get most links by reciprocation

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 9 / 17

Page 24: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farming by spammers

Spammers farm links at large scaleOver 13 million users (27% of total) targeted by 41,352 spammers (0.08% oftotal)

1.3 million spam-followers82% are targeted→ spammers get most links by reciprocation

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 9 / 17

Page 25: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farming makes spammers influential

Spammers get more followers than an average Twitter user

Some spammers acquire very high Pageranks :

304 with top 100,000(0.18% of all users)

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 10 / 17

Page 26: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Link farming makes spammers influential

Spammers get more followers than an average Twitter user

Some spammers acquire very high Pageranks : 304 with top 100,000(0.18% of all users)

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 10 / 17

Page 27: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Who are the spam-followers?

Non-targeted spam-followersMostly sybils / hired helps of spammers

Most have now been suspended by Twitter (9,725 among top 10,000,having links to spammers)

Targeted spam-followersRanked on the basis of number of links to spammers

60% of the follow-links acquired by spammers come from the top 100,000targeted followers

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 11 / 17

Page 28: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Who are the spam-followers?

Non-targeted spam-followersMostly sybils / hired helps of spammers

Most have now been suspended by Twitter (9,725 among top 10,000,having links to spammers)

Targeted spam-followersRanked on the basis of number of links to spammers

60% of the follow-links acquired by spammers come from the top 100,000targeted followers

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 11 / 17

Page 29: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Who are the top link-farmers?

Analyzed the status of the top 100,000 link farmers (July, 2011)

76% still exist and have not been suspended by Twitter

235 verified as real, well-known users

much higher indegree as well as outdegree compared to spammers

Most of their tweets contain valid URLs

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 12 / 17

Page 30: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Who are the top link-farmers?

Analyzed the status of the top 100,000 link farmers (July, 2011)

76% still exist and have not been suspended by Twitter

235 verified as real, well-known users

much higher indegree as well as outdegree compared to spammers

Most of their tweets contain valid URLs

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 12 / 17

Page 31: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Who are the top link-farmers?

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 13 / 17

Page 32: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Who are the top link-farmers?

Highly influential usersRank within top 5% as perPagerank, follower-rank,retweet-rank

Mostly social marketers,enterpreneurs, ...

Want to promote someonline business/website

Heavily interconnect witheach other - density 0.018(10−7 for the whole graph)

Aim: to acquire socialcapital

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 14 / 17

Page 33: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Who are the top link-farmers?

Highly influential usersRank within top 5% as perPagerank, follower-rank,retweet-rank

Mostly social marketers,enterpreneurs, ...

Want to promote someonline business/website

Heavily interconnect witheach other - density 0.018(10−7 for the whole graph)

Aim: to acquire socialcapital

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 14 / 17

Page 34: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Combating the problem

Not practical for Twitter to suspend / blacklist top link-farmers

SolutionStrategy to disincentivize users from following / reciprocating to unknownpeople

Penalize users for following spammers

Collusionrank: inverse of pagerankNegatively bias a small set of known spammers

Propagate negative scores from spammers to spam-followers

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 15 / 17

Page 35: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Combating the problem

Not practical for Twitter to suspend / blacklist top link-farmers

SolutionStrategy to disincentivize users from following / reciprocating to unknownpeople

Penalize users for following spammers

Collusionrank: inverse of pagerankNegatively bias a small set of known spammers

Propagate negative scores from spammers to spam-followers

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 15 / 17

Page 36: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Combating the problem

Not practical for Twitter to suspend / blacklist top link-farmers

SolutionStrategy to disincentivize users from following / reciprocating to unknownpeople

Penalize users for following spammers

Collusionrank: inverse of pagerankNegatively bias a small set of known spammers

Propagate negative scores from spammers to spam-followers

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 15 / 17

Page 37: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Collusionrank

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 16 / 17

Page 38: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Pagerank+Collusionrank

Computed Collusionrank considering 600 known spammers

Rank users by Pagerank + Collusionrank

→ Effectively filters out spammers and link-farmers (top spam-followers)from top ranks

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 17 / 17

Page 39: Link Farming in Twitterpawang/courses/SC14/lec4.pdf · 2014. 11. 17. · 1.3 million spam-followers 82% are targeted!spammers get most links by reciprocation Pawan Goyal (IIT Kharagpur)

Pagerank+Collusionrank

Computed Collusionrank considering 600 known spammers

Rank users by Pagerank + Collusionrank→ Effectively filters out spammers and link-farmers (top spam-followers)from top ranks

Pawan Goyal (IIT Kharagpur) Link Farming in Twitter July 31, 2014 17 / 17