social networks of spammershero/preprints/social...total emails received at project honey pot trap...
TRANSCRIPT
![Page 1: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/1.jpg)
Social Networks of Spammers
Alfred O. Hero, III
Department of Electrical Engineering and Computer ScienceUniversity of Michigan
September 22, 2008
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 1 / 54
![Page 2: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/2.jpg)
Outline
1 IntroductionObjectivesHarvesting and SpammingSocial Networks
2 MethodologyCommunity DetectionSimilarity Measures
3 Results
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 2 / 54
![Page 3: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/3.jpg)
Introduction
Outline
1 IntroductionObjectivesHarvesting and SpammingSocial Networks
2 MethodologyCommunity DetectionSimilarity Measures
3 Results
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 3 / 54
![Page 4: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/4.jpg)
Introduction
Acknowledgements
My co-authors on “Revealing social networks of spammersthrough spectral clustering,” ICC09 (submitted)
Kevin Xu, Yilun Chen, Peter Woolf - University of MichiganMark Kliger - Medasense Biometrics, Inc
Other collaboratorsJohn Bell, Nitin Nayar - University of MichiganMatthew Prince, Eric Langheinrich, Lee Holloway - UnspamTechnologiesMatt Roughan, Olaf Maennel - University of Adelaide
SponsorsNational Science Foundation CCR-0325571Office of Naval Research N00014-08-1065NSERC graduate fellowship program
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 4 / 54
![Page 5: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/5.jpg)
Introduction
Acknowledgements
My co-authors on “Revealing social networks of spammersthrough spectral clustering,” ICC09 (submitted)
Kevin Xu, Yilun Chen, Peter Woolf - University of MichiganMark Kliger - Medasense Biometrics, Inc
Other collaboratorsJohn Bell, Nitin Nayar - University of MichiganMatthew Prince, Eric Langheinrich, Lee Holloway - UnspamTechnologiesMatt Roughan, Olaf Maennel - University of Adelaide
SponsorsNational Science Foundation CCR-0325571Office of Naval Research N00014-08-1065NSERC graduate fellowship program
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 4 / 54
![Page 6: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/6.jpg)
Introduction
Acknowledgements
My co-authors on “Revealing social networks of spammersthrough spectral clustering,” ICC09 (submitted)
Kevin Xu, Yilun Chen, Peter Woolf - University of MichiganMark Kliger - Medasense Biometrics, Inc
Other collaboratorsJohn Bell, Nitin Nayar - University of MichiganMatthew Prince, Eric Langheinrich, Lee Holloway - UnspamTechnologiesMatt Roughan, Olaf Maennel - University of Adelaide
SponsorsNational Science Foundation CCR-0325571Office of Naval Research N00014-08-1065NSERC graduate fellowship program
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 4 / 54
![Page 7: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/7.jpg)
Introduction Objectives
Objectives
Objectives of this studyTo reveal social networks of spammers
Identifying communities of spammersFinding characteristics or “signatures” of communities
To understand temporal dynamics of spammers’ behaviorDetecting changes in social structure
MotivationCurrent anti-spam methods
Content filteringIP address blacklisting
Allows us to fight spam from another perspective by usingspammers’ social structure
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 5 / 54
![Page 8: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/8.jpg)
Introduction Objectives
Objectives
Objectives of this studyTo reveal social networks of spammers
Identifying communities of spammersFinding characteristics or “signatures” of communities
To understand temporal dynamics of spammers’ behaviorDetecting changes in social structure
MotivationCurrent anti-spam methods
Content filteringIP address blacklisting
Allows us to fight spam from another perspective by usingspammers’ social structure
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 5 / 54
![Page 9: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/9.jpg)
Introduction Objectives
Background
Much of past research on spam has focussed on scalar analysis
Spam/phishing content structural analysis [Chandrasekaran,Narayanan, Uphadhyaya CSC06]Server lifetime and reachability analysis [Duan,Gopalan, Yuan,ICC07]Spam botnet behavior patterns [Ramachandran, Feamster,SIGCOMM06]Honeypot summary statistics [Prince, Holloway, Langheinrich,Dahl, Keller EAS05]
•We perform analysis of spammer interactions over entire spam cycle
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 6 / 54
![Page 10: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/10.jpg)
Introduction Objectives
Background
Much of past research on spam has focussed on scalar analysis
Spam/phishing content structural analysis [Chandrasekaran,Narayanan, Uphadhyaya CSC06]Server lifetime and reachability analysis [Duan,Gopalan, Yuan,ICC07]Spam botnet behavior patterns [Ramachandran, Feamster,SIGCOMM06]Honeypot summary statistics [Prince, Holloway, Langheinrich,Dahl, Keller EAS05]
•We perform analysis of spammer interactions over entire spam cycle
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 6 / 54
![Page 11: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/11.jpg)
Introduction Harvesting and Spamming
The Spam Cycle
Two phases of the spam cycleHarvesting: collecting email addresses from web sites using spambotsSpamming: sending large amounts of emails to collectedaddresses using spam servers
Spammers conceal their identity (IP address) in spamming phaseby using public SMTP servers, open proxies, botnets, etc.Key assumption: spammer IP address in harvesting phase isclosely related to actual location
Previous study found harvester IP address more closely related toactual spammer than spam server IP address (Prince et. al, 2005)We treat the harvester as the spam source
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 7 / 54
![Page 12: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/12.jpg)
Introduction Harvesting and Spamming
The Spam Cycle
Two phases of the spam cycleHarvesting: collecting email addresses from web sites using spambotsSpamming: sending large amounts of emails to collectedaddresses using spam servers
Spammers conceal their identity (IP address) in spamming phaseby using public SMTP servers, open proxies, botnets, etc.Key assumption: spammer IP address in harvesting phase isclosely related to actual location
Previous study found harvester IP address more closely related toactual spammer than spam server IP address (Prince et. al, 2005)We treat the harvester as the spam source
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 7 / 54
![Page 13: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/13.jpg)
Introduction Harvesting and Spamming
The Spam Cycle
Two phases of the spam cycleHarvesting: collecting email addresses from web sites using spambotsSpamming: sending large amounts of emails to collectedaddresses using spam servers
Spammers conceal their identity (IP address) in spamming phaseby using public SMTP servers, open proxies, botnets, etc.Key assumption: spammer IP address in harvesting phase isclosely related to actual location
Previous study found harvester IP address more closely related toactual spammer than spam server IP address (Prince et. al, 2005)We treat the harvester as the spam source
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 7 / 54
![Page 14: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/14.jpg)
Introduction Harvesting and Spamming
Harvestor Email Address Collection
Spam bot
Email addresses:[email protected]@abc.com
Links:john.html
www.def.com
HTML Source of www.abc.com
[email protected]@abc.com
...
Spammer’s Database
Email addresses:[email protected]
Links:kevin.html
Email addresses:[email protected]
Links:www.ghi.com
HTML Source of john.html HTML Source of www.def.com
...
How harvesters acquire email addresses using spam bots
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 8 / 54
![Page 15: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/15.jpg)
Introduction Harvesting and Spamming
The Path of Spam
The path of spam: from an email address on a web page to your inbox
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 9 / 54
![Page 16: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/16.jpg)
Introduction Harvesting and Spamming
Project Honey Pot
Network of decoy web pages (“honey pots”) with trap emailaddressesAll email received is spamUnique email address generated at each visitVisitor (harvester) IP address is trackedWhen spam is received, we know the harvester IP address inaddition to the spam server IP address
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 10 / 54
![Page 17: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/17.jpg)
Introduction Harvesting and Spamming
Project Honey Pot
Network of decoy web pages (“honey pots”) with trap emailaddressesAll email received is spamUnique email address generated at each visitVisitor (harvester) IP address is trackedWhen spam is received, we know the harvester IP address inaddition to the spam server IP address
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 10 / 54
![Page 18: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/18.jpg)
Introduction Harvesting and Spamming
Project Honey Pot
Network of decoy web pages (“honey pots”) with trap emailaddressesAll email received is spamUnique email address generated at each visitVisitor (harvester) IP address is trackedWhen spam is received, we know the harvester IP address inaddition to the spam server IP address
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 10 / 54
![Page 19: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/19.jpg)
Introduction Harvesting and Spamming
Project Honey Pot
Network of decoy web pages (“honey pots”) with trap emailaddressesAll email received is spamUnique email address generated at each visitVisitor (harvester) IP address is trackedWhen spam is received, we know the harvester IP address inaddition to the spam server IP address
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 10 / 54
![Page 20: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/20.jpg)
Introduction Harvesting and Spamming
Project Honey Pot
Network of decoy web pages (“honey pots”) with trap emailaddressesAll email received is spamUnique email address generated at each visitVisitor (harvester) IP address is trackedWhen spam is received, we know the harvester IP address inaddition to the spam server IP address
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 10 / 54
![Page 21: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/21.jpg)
Introduction Harvesting and Spamming
Project Honey Pot Statistics (as of Sept. 16, 2008)
Spam Trap Addresses Monitored: 29,765,172Spam Trap Monitoring Capability: 272,870,000,000Spam Servers Identified: 29,712,922Harvesters Identified: 52,069
www.projecthoneypot.org
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 11 / 54
![Page 22: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/22.jpg)
Introduction Harvesting and Spamming
Total Emails By Month
0
0.5
1
1.5
2
2.5x 10
6
2005
−09
2006
−03
2006
−09
2007
−03
2007
−09
Month
# of
em
ails
Total emails received at Project Honey Pot trap addresses by month
Outbreak of spam observed in October 2006 consistent withmedia reports
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 12 / 54
![Page 23: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/23.jpg)
Introduction Harvesting and Spamming
Total Active Harvesters By Month
0
1000
2000
3000
4000
5000
6000
7000
2005
−09
2006
−03
2006
−09
2007
−03
2007
−09
Month
# of
har
vest
ers
Total active harvesters tracked by Project Honey Pot by month
Increase in harvesters in October 2006 not as significant asincrease in number of spam emails
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 13 / 54
![Page 24: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/24.jpg)
Introduction Harvesting and Spamming
Harvestor-to-server degree distribution: May 2006
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 14 / 54
![Page 25: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/25.jpg)
Introduction Harvesting and Spamming
Harvestor-to-server degree distribution: Oct 2006
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 15 / 54
![Page 26: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/26.jpg)
Introduction Harvesting and Spamming
Phishing
Phishing is an attempt to fraudulently acquire sensitive informationby appearing to represent a trustworthy entityProject Honey Pot is an excellent data source for studyingphishing emails
Trap email address cannot, for example, sign up for a PayPalaccountAll emails supposedly received from financial institutions can beclassified as phishing
We classify an email as a phishing email if its subject containscommon phishing words
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 16 / 54
![Page 27: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/27.jpg)
Introduction Harvesting and Spamming
Phishing
Phishing is an attempt to fraudulently acquire sensitive informationby appearing to represent a trustworthy entityProject Honey Pot is an excellent data source for studyingphishing emails
Trap email address cannot, for example, sign up for a PayPalaccountAll emails supposedly received from financial institutions can beclassified as phishing
We classify an email as a phishing email if its subject containscommon phishing words
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 16 / 54
![Page 28: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/28.jpg)
Introduction Harvesting and Spamming
Phishing
Phishing is an attempt to fraudulently acquire sensitive informationby appearing to represent a trustworthy entityProject Honey Pot is an excellent data source for studyingphishing emails
Trap email address cannot, for example, sign up for a PayPalaccountAll emails supposedly received from financial institutions can beclassified as phishing
We classify an email as a phishing email if its subject containscommon phishing words
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 16 / 54
![Page 29: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/29.jpg)
Introduction Harvesting and Spamming
Phishing Statistics
Define a phishing level for each harvester as
Phishing level =# of phishing emails sent
total # of emails sentLabel harvesters with phishing level > 0.5 as phishersOctober 2006 statistics
4.5% of emails were phishing emails23% of harvesters were phishers
0 0.2 0.4 0.6 0.8 10
500
1000
1500
2000
Phishing level
# of
har
vest
ers
Histogram of harvesters’ phishing levels from October 2006
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 17 / 54
![Page 30: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/30.jpg)
Introduction Social Networks
Social Networks
Social network: social structure consisting of actors and tiesActors represent individualsTies represent relationships between individuals
School friendships Scientific collaborations
Moody, 2001 Girvan and Newman, 2002
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 18 / 54
![Page 31: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/31.jpg)
Methodology
Outline
1 IntroductionObjectivesHarvesting and SpammingSocial Networks
2 MethodologyCommunity DetectionSimilarity Measures
3 Results
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 19 / 54
![Page 32: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/32.jpg)
Methodology
Graph of Harvester Interactions
Represent network of harvesters by undirected weighted graphG = (V ,E ,W )
V : set of vertices (harvesters)E : set of edges between harvestersW : matrix of edge weights (adjacency matrix of graph)
Edge weights represent strength of connection between twoharvestersTotal weights of edges between two sets of harvesters A,B ⊂ V isdefined by
links(A,B) =∑i∈A
∑j∈B
wij
Degree of a set A is defined by
deg(A) = links(A,V )
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 20 / 54
![Page 33: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/33.jpg)
Methodology
Graph of Harvester Interactions
Represent network of harvesters by undirected weighted graphG = (V ,E ,W )
V : set of vertices (harvesters)E : set of edges between harvestersW : matrix of edge weights (adjacency matrix of graph)
Edge weights represent strength of connection between twoharvestersTotal weights of edges between two sets of harvesters A,B ⊂ V isdefined by
links(A,B) =∑i∈A
∑j∈B
wij
Degree of a set A is defined by
deg(A) = links(A,V )
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 20 / 54
![Page 34: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/34.jpg)
Methodology
Graph of Harvester Interactions
Represent network of harvesters by undirected weighted graphG = (V ,E ,W )
V : set of vertices (harvesters)E : set of edges between harvestersW : matrix of edge weights (adjacency matrix of graph)
Edge weights represent strength of connection between twoharvestersTotal weights of edges between two sets of harvesters A,B ⊂ V isdefined by
links(A,B) =∑i∈A
∑j∈B
wij
Degree of a set A is defined by
deg(A) = links(A,V )
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 20 / 54
![Page 35: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/35.jpg)
Methodology Community Detection
Community Detection
Characteristics of a communityHigh similarity between actors within communityLow similarity between actors in different communities
Formulate community detection as a graph partitioning problemDivide the graph into clustersMaximize edge weights within clusters (association)Minimize edge weights between clusters (cut)
Using edge weights normalized by group sizes results in bettergroups
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 21 / 54
![Page 36: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/36.jpg)
Methodology Community Detection
Community Detection
Characteristics of a communityHigh similarity between actors within communityLow similarity between actors in different communities
Formulate community detection as a graph partitioning problemDivide the graph into clustersMaximize edge weights within clusters (association)Minimize edge weights between clusters (cut)
Using edge weights normalized by group sizes results in bettergroups
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 21 / 54
![Page 37: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/37.jpg)
Methodology Community Detection
Community Detection
Characteristics of a communityHigh similarity between actors within communityLow similarity between actors in different communities
Formulate community detection as a graph partitioning problemDivide the graph into clustersMaximize edge weights within clusters (association)Minimize edge weights between clusters (cut)
Using edge weights normalized by group sizes results in bettergroups
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 21 / 54
![Page 38: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/38.jpg)
Methodology Community Detection
Normalized Cut and Association
Normalized cut of a graph partition ΓKV is defined as
KNcut(ΓKV ) =
1K
K∑i=1
links(Vi ,V\Vi)
deg(Vi)
Normalized association of ΓKV is defined as
KNassoc(ΓKV ) =
1K
K∑i=1
links(Vi ,Vi)
deg(Vi)
KNcut(ΓKV ) + KNassoc(ΓK
V ) = 1 so minimizing normalized cutsimultaneously maximizes normalized associationWe try to maximize normalized association
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 22 / 54
![Page 39: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/39.jpg)
Methodology Community Detection
Normalized Cut and Association
Normalized cut of a graph partition ΓKV is defined as
KNcut(ΓKV ) =
1K
K∑i=1
links(Vi ,V\Vi)
deg(Vi)
Normalized association of ΓKV is defined as
KNassoc(ΓKV ) =
1K
K∑i=1
links(Vi ,Vi)
deg(Vi)
KNcut(ΓKV ) + KNassoc(ΓK
V ) = 1 so minimizing normalized cutsimultaneously maximizes normalized associationWe try to maximize normalized association
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 22 / 54
![Page 40: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/40.jpg)
Methodology Community Detection
Normalized Cut and Association
Normalized cut of a graph partition ΓKV is defined as
KNcut(ΓKV ) =
1K
K∑i=1
links(Vi ,V\Vi)
deg(Vi)
Normalized association of ΓKV is defined as
KNassoc(ΓKV ) =
1K
K∑i=1
links(Vi ,Vi)
deg(Vi)
KNcut(ΓKV ) + KNassoc(ΓK
V ) = 1 so minimizing normalized cutsimultaneously maximizes normalized associationWe try to maximize normalized association
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 22 / 54
![Page 41: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/41.jpg)
Methodology Community Detection
The Discrete Optimization Problem
Represent graph partition ΓKV by matrix X = [x1,x2, . . . ,xK]
xi: column indicator vector with ones in the rows corresponding toharvesters in cluster iDegree matrix D = diag(W1M)Rewrite links and deg as
links(Vi ,Vi) = xiT Wxi
deg(Vi) = xiT Dxi
KNassoc maximization problem becomes
maximize KNassoc(X ) =1K
K∑i=1
xiT Wxi
xiT Dxi
subject to X ∈ {0,1}M×K
X1K = 1M
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 23 / 54
![Page 42: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/42.jpg)
Methodology Community Detection
The Discrete Optimization Problem
Represent graph partition ΓKV by matrix X = [x1,x2, . . . ,xK]
xi: column indicator vector with ones in the rows corresponding toharvesters in cluster iDegree matrix D = diag(W1M)Rewrite links and deg as
links(Vi ,Vi) = xiT Wxi
deg(Vi) = xiT Dxi
KNassoc maximization problem becomes
maximize KNassoc(X ) =1K
K∑i=1
xiT Wxi
xiT Dxi
subject to X ∈ {0,1}M×K
X1K = 1M
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 23 / 54
![Page 43: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/43.jpg)
Methodology Community Detection
The Discrete Optimization Problem
Represent graph partition ΓKV by matrix X = [x1,x2, . . . ,xK]
xi: column indicator vector with ones in the rows corresponding toharvesters in cluster iDegree matrix D = diag(W1M)Rewrite links and deg as
links(Vi ,Vi) = xiT Wxi
deg(Vi) = xiT Dxi
KNassoc maximization problem becomes
maximize KNassoc(X ) =1K
K∑i=1
xiT Wxi
xiT Dxi
subject to X ∈ {0,1}M×K
X1K = 1M
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 23 / 54
![Page 44: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/44.jpg)
Methodology Community Detection
The Discrete Optimization Problem
Represent graph partition ΓKV by matrix X = [x1,x2, . . . ,xK]
xi: column indicator vector with ones in the rows corresponding toharvesters in cluster iDegree matrix D = diag(W1M)Rewrite links and deg as
links(Vi ,Vi) = xiT Wxi
deg(Vi) = xiT Dxi
KNassoc maximization problem becomes
maximize KNassoc(X ) =1K
K∑i=1
xiT Wxi
xiT Dxi
subject to X ∈ {0,1}M×K
X1K = 1M
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 23 / 54
![Page 45: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/45.jpg)
Methodology Community Detection
Spectral Clustering
KNassoc maximization problem has exponential complexity evenfor K = 2Define Z = X (X T DX )−1/2
Reformulate problem
maximize KNassoc(Z ) =1K
tr(Z T WZ )
subject to Z T DZ = IK
Relax Z into continuous domainSolve generalized eigenvalue problem
W zi = λD zi
Form optimal continuous partition matrix Z = [ z1, z2, . . . , zK ] anddiscretize to get near global-optimal solution (Yu and Shi, 2003)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 24 / 54
![Page 46: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/46.jpg)
Methodology Community Detection
Spectral Clustering
KNassoc maximization problem has exponential complexity evenfor K = 2Define Z = X (X T DX )−1/2
Reformulate problem
maximize KNassoc(Z ) =1K
tr(Z T WZ )
subject to Z T DZ = IK
Relax Z into continuous domainSolve generalized eigenvalue problem
W zi = λD zi
Form optimal continuous partition matrix Z = [ z1, z2, . . . , zK ] anddiscretize to get near global-optimal solution (Yu and Shi, 2003)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 24 / 54
![Page 47: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/47.jpg)
Methodology Community Detection
Spectral Clustering
KNassoc maximization problem has exponential complexity evenfor K = 2Define Z = X (X T DX )−1/2
Reformulate problem
maximize KNassoc(Z ) =1K
tr(Z T WZ )
subject to Z T DZ = IK
Relax Z into continuous domainSolve generalized eigenvalue problem
W zi = λD zi
Form optimal continuous partition matrix Z = [ z1, z2, . . . , zK ] anddiscretize to get near global-optimal solution (Yu and Shi, 2003)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 24 / 54
![Page 48: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/48.jpg)
Methodology Community Detection
Spectral Clustering
KNassoc maximization problem has exponential complexity evenfor K = 2Define Z = X (X T DX )−1/2
Reformulate problem
maximize KNassoc(Z ) =1K
tr(Z T WZ )
subject to Z T DZ = IK
Relax Z into continuous domainSolve generalized eigenvalue problem
W zi = λD zi
Form optimal continuous partition matrix Z = [ z1, z2, . . . , zK ] anddiscretize to get near global-optimal solution (Yu and Shi, 2003)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 24 / 54
![Page 49: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/49.jpg)
Methodology Community Detection
Spectral Clustering
KNassoc maximization problem has exponential complexity evenfor K = 2Define Z = X (X T DX )−1/2
Reformulate problem
maximize KNassoc(Z ) =1K
tr(Z T WZ )
subject to Z T DZ = IK
Relax Z into continuous domainSolve generalized eigenvalue problem
W zi = λD zi
Form optimal continuous partition matrix Z = [ z1, z2, . . . , zK ] anddiscretize to get near global-optimal solution (Yu and Shi, 2003)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 24 / 54
![Page 50: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/50.jpg)
Methodology Community Detection
Spectral Clustering
KNassoc maximization problem has exponential complexity evenfor K = 2Define Z = X (X T DX )−1/2
Reformulate problem
maximize KNassoc(Z ) =1K
tr(Z T WZ )
subject to Z T DZ = IK
Relax Z into continuous domainSolve generalized eigenvalue problem
W zi = λD zi
Form optimal continuous partition matrix Z = [ z1, z2, . . . , zK ] anddiscretize to get near global-optimal solution (Yu and Shi, 2003)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 24 / 54
![Page 51: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/51.jpg)
Methodology Community Detection
Choosing the Number of Clusters
How do we choose the number of clusters?Heuristic for spectral clustering: look at the gap betweeneigenvalues of the Laplacian matrix of the graph (von Luxburg,2007)
0 2 4 6 8 100
0.02
0.04
0.06
0.08
Gap between 7thand 8th eigenvalues
Ten smallest eigenvalues of Laplacian matrix
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 25 / 54
![Page 52: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/52.jpg)
Methodology Similarity Measures
Choosing Edge Weights
Edge weights wij represent strength of connection betweenharvesters i and jWe cannot observe direct relationships between harvestersUse indirect relationships to determine edge weights
Similarity in spam server usageSimilarity in temporal spammingSimilarity in temporal harvesting
Choice of similarity measure determines topology of the graphPoor choice could lead to detecting no community structureCreate coincidence matrix H as intermediate step to creatingadjacency matrix W
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 26 / 54
![Page 53: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/53.jpg)
Methodology Similarity Measures
Choosing Edge Weights
Edge weights wij represent strength of connection betweenharvesters i and jWe cannot observe direct relationships between harvestersUse indirect relationships to determine edge weights
Similarity in spam server usageSimilarity in temporal spammingSimilarity in temporal harvesting
Choice of similarity measure determines topology of the graphPoor choice could lead to detecting no community structureCreate coincidence matrix H as intermediate step to creatingadjacency matrix W
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 26 / 54
![Page 54: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/54.jpg)
Methodology Similarity Measures
Choosing Edge Weights
Edge weights wij represent strength of connection betweenharvesters i and jWe cannot observe direct relationships between harvestersUse indirect relationships to determine edge weights
Similarity in spam server usageSimilarity in temporal spammingSimilarity in temporal harvesting
Choice of similarity measure determines topology of the graphPoor choice could lead to detecting no community structureCreate coincidence matrix H as intermediate step to creatingadjacency matrix W
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 26 / 54
![Page 55: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/55.jpg)
Methodology Similarity Measures
Choosing Edge Weights
Edge weights wij represent strength of connection betweenharvesters i and jWe cannot observe direct relationships between harvestersUse indirect relationships to determine edge weights
Similarity in spam server usageSimilarity in temporal spammingSimilarity in temporal harvesting
Choice of similarity measure determines topology of the graphPoor choice could lead to detecting no community structureCreate coincidence matrix H as intermediate step to creatingadjacency matrix W
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 26 / 54
![Page 56: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/56.jpg)
Methodology Similarity Measures
Choosing Edge Weights
Edge weights wij represent strength of connection betweenharvesters i and jWe cannot observe direct relationships between harvestersUse indirect relationships to determine edge weights
Similarity in spam server usageSimilarity in temporal spammingSimilarity in temporal harvesting
Choice of similarity measure determines topology of the graphPoor choice could lead to detecting no community structureCreate coincidence matrix H as intermediate step to creatingadjacency matrix W
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 26 / 54
![Page 57: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/57.jpg)
Methodology Similarity Measures
Choosing Edge Weights
Edge weights wij represent strength of connection betweenharvesters i and jWe cannot observe direct relationships between harvestersUse indirect relationships to determine edge weights
Similarity in spam server usageSimilarity in temporal spammingSimilarity in temporal harvesting
Choice of similarity measure determines topology of the graphPoor choice could lead to detecting no community structureCreate coincidence matrix H as intermediate step to creatingadjacency matrix W
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 26 / 54
![Page 58: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/58.jpg)
Methodology Similarity Measures
Similarity in Spam Server Usage
Spammers need spam servers to send emailsCommon usage of spam servers between harvesters may indicatesocial connectionCreate bipartite graph of harvesters and spam serversChoose edge weights based on correlation in spam server usage
Harvester A Harvester B Harvester C
Spam Server 1
Spam Server 2
Spam Server 3
Spam Server 4
Spam Server 5
Spam Server 6
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 27 / 54
![Page 59: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/59.jpg)
Methodology Similarity Measures
Similarity in Spam Server Usage
Spammers need spam servers to send emailsCommon usage of spam servers between harvesters may indicatesocial connectionCreate bipartite graph of harvesters and spam serversChoose edge weights based on correlation in spam server usage
Harvester A Harvester B Harvester C
Spam Server 1
Spam Server 2
Spam Server 3
Spam Server 4
Spam Server 5
Spam Server 6
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 27 / 54
![Page 60: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/60.jpg)
Methodology Similarity Measures
Similarity in Spam Server Usage
Spammers need spam servers to send emailsCommon usage of spam servers between harvesters may indicatesocial connectionCreate bipartite graph of harvesters and spam serversChoose edge weights based on correlation in spam server usage
Harvester A Harvester B Harvester C
Spam Server 1
Spam Server 2
Spam Server 3
Spam Server 4
Spam Server 5
Spam Server 6
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 27 / 54
![Page 61: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/61.jpg)
Methodology Similarity Measures
Similarity in Spam Server Usage
Spammers need spam servers to send emailsCommon usage of spam servers between harvesters may indicatesocial connectionCreate bipartite graph of harvesters and spam serversChoose edge weights based on correlation in spam server usage
Harvester A Harvester B Harvester C
Spam Server 1
Spam Server 2
Spam Server 3
Spam Server 4
Spam Server 5
Spam Server 6
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 27 / 54
![Page 62: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/62.jpg)
Methodology Similarity Measures
Similarity in Spam Server Usage Coincidence Matrix
Create coincidence matrix H between harvesters and spamservers
H =
[pij
djei
]M,N
i,j=1
pij : the number of emails sent using spam server j to emailaddresses collected by harvester idj : the total number of emails sent by spam server jei : the total number of email addresses collected by harvester iEntries of incidence matrix represent harvester i ’s percentage ofusage of spam server j per address he has acquired
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 28 / 54
![Page 63: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/63.jpg)
Methodology Similarity Measures
Temporal Similarity
Common temporal patterns of activity may also indicate socialconnectionLook at number of emails sent or email addresses collected asfunction of timeDiscretize time into 1-hour intervals
0 100 200 300 400 500 600 700 8000
50
100
150
0 100 200 300 400 500 600 700 8000
50
100
150
Time (in 1−hour bins)
# of
em
ails
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 29 / 54
![Page 64: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/64.jpg)
Methodology Similarity Measures
Sample Temporal Histograms
0 400 8000
5
10
Time (in 1−hour bins)
# of
em
ails
0 400 8000
50
100
150
0 400 8000
10
20
30
0 400 8000
5
10
Sample temporal spamming histograms representing four types ofdistributions
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 30 / 54
![Page 65: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/65.jpg)
Methodology Similarity Measures
Temporal Similarity Coincidence Matrices
Choose edge weights based on correlation in number of emailssent during each time intervalSimilarity in temporal spamming
Create coincidence matrix H between harvesters and discretizedtime intervals
H =
[sij
ei
]M,N
i,j=1
sij : number of emails sent by harvester i during j th time intervalei : the total number of email addresses collected by harvester i
Similarity in temporal harvestingCreate coincidence matrix H between harvesters and discretizedtime intervals
H = [aij ]M,Ni,j=1
aij : number of addresses collected by harvester i during j th timeinterval
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 31 / 54
![Page 66: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/66.jpg)
Methodology Similarity Measures
Temporal Similarity Coincidence Matrices
Choose edge weights based on correlation in number of emailssent during each time intervalSimilarity in temporal spamming
Create coincidence matrix H between harvesters and discretizedtime intervals
H =
[sij
ei
]M,N
i,j=1
sij : number of emails sent by harvester i during j th time intervalei : the total number of email addresses collected by harvester i
Similarity in temporal harvestingCreate coincidence matrix H between harvesters and discretizedtime intervals
H = [aij ]M,Ni,j=1
aij : number of addresses collected by harvester i during j th timeinterval
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 31 / 54
![Page 67: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/67.jpg)
Methodology Similarity Measures
Temporal Similarity Coincidence Matrices
Choose edge weights based on correlation in number of emailssent during each time intervalSimilarity in temporal spamming
Create coincidence matrix H between harvesters and discretizedtime intervals
H =
[sij
ei
]M,N
i,j=1
sij : number of emails sent by harvester i during j th time intervalei : the total number of email addresses collected by harvester i
Similarity in temporal harvestingCreate coincidence matrix H between harvesters and discretizedtime intervals
H = [aij ]M,Ni,j=1
aij : number of addresses collected by harvester i during j th timeinterval
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 31 / 54
![Page 68: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/68.jpg)
Methodology Similarity Measures
Creating the Adjacency Matrix
From coincidence matrix H we can obtain a matrix ofunnormalized pairwise similarities S = HHT
Normalize S to obtain matrix of normalized pairwise similaritiesN = D−1/2SD−1/2
D = diag(S)Scales similarities so each harvester’s self-similarity is 1Ensures each harvester is equally important
Connect harvesters to their k nearest neighbors according tosimilarities in N to form adjacency matrix W
Results in sparser adjacency matrixHow to choose k?Heuristic: Choose k = log n to start and increase as necessary toavoid artificially disconnecting components (von Luxburg, 2007)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 32 / 54
![Page 69: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/69.jpg)
Methodology Similarity Measures
Creating the Adjacency Matrix
From coincidence matrix H we can obtain a matrix ofunnormalized pairwise similarities S = HHT
Normalize S to obtain matrix of normalized pairwise similaritiesN = D−1/2SD−1/2
D = diag(S)Scales similarities so each harvester’s self-similarity is 1Ensures each harvester is equally important
Connect harvesters to their k nearest neighbors according tosimilarities in N to form adjacency matrix W
Results in sparser adjacency matrixHow to choose k?Heuristic: Choose k = log n to start and increase as necessary toavoid artificially disconnecting components (von Luxburg, 2007)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 32 / 54
![Page 70: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/70.jpg)
Methodology Similarity Measures
Creating the Adjacency Matrix
From coincidence matrix H we can obtain a matrix ofunnormalized pairwise similarities S = HHT
Normalize S to obtain matrix of normalized pairwise similaritiesN = D−1/2SD−1/2
D = diag(S)Scales similarities so each harvester’s self-similarity is 1Ensures each harvester is equally important
Connect harvesters to their k nearest neighbors according tosimilarities in N to form adjacency matrix W
Results in sparser adjacency matrixHow to choose k?Heuristic: Choose k = log n to start and increase as necessary toavoid artificially disconnecting components (von Luxburg, 2007)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 32 / 54
![Page 71: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/71.jpg)
Results
Outline
1 IntroductionObjectivesHarvesting and SpammingSocial Networks
2 MethodologyCommunity DetectionSimilarity Measures
3 Results
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 33 / 54
![Page 72: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/72.jpg)
Results
Similarity in Spam Server Usage
Results from October 2006 using similarity in spam server usage(visualization created using Cytoscape)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 34 / 54
![Page 73: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/73.jpg)
Results
Alternate View Colored By Phishing Level
Results from October 2006 colored by phishing level
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 35 / 54
![Page 74: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/74.jpg)
Results
Distribution of Phishers in Clusters
Distribution of phishers in clusters from October 2006 results
Label 1 2 3 4 5 6 7 8
Cluster size 1040 188 77 68 68 35 29 26
# of phishers 17 0 10 0 65 28 24 0
% of phishers 1.63 0 13.0 0 95.6 80 82.8 0
Label 9 10 11 12 13 14 15 16
Cluster size 19 16 14 14 14 11 11 11
# of phishers 18 16 13 1 12 9 0 11
% of phishers 94.7 100 92.9 7.14 85.7 81.8 0 100
Very few phishers in large, loosely-connected clusterMany small, tightly-connected clusters have high concentration ofphishers
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 36 / 54
![Page 75: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/75.jpg)
Results
Cluster Validation Indices
Rand index: measure of agreement between clustering resultsand labels
Rand index =a + d
a + b + c + d
a: number of pairs of nodes with same label and in same clusterb: number of pairs with same label but in different clustersc: number of pairs with different labels but in the same clusterd : number of pairs with different labels and in different clusters
Adjusted Rand index: Rand index corrected for chance (Hubertand Arabie, 1985)Expected adjusted Rand index for random clustering result is 0Label clusters as phishing clusters if ratio of phishers toharvesters > 0.5Use phisher or non-phisher as label for each harvesterLook for agreement between harvester labels and cluster labels
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 37 / 54
![Page 76: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/76.jpg)
Results
Cluster Validation Indices
Rand index: measure of agreement between clustering resultsand labels
Rand index =a + d
a + b + c + d
a: number of pairs of nodes with same label and in same clusterb: number of pairs with same label but in different clustersc: number of pairs with different labels but in the same clusterd : number of pairs with different labels and in different clusters
Adjusted Rand index: Rand index corrected for chance (Hubertand Arabie, 1985)Expected adjusted Rand index for random clustering result is 0Label clusters as phishing clusters if ratio of phishers toharvesters > 0.5Use phisher or non-phisher as label for each harvesterLook for agreement between harvester labels and cluster labels
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 37 / 54
![Page 77: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/77.jpg)
Results
Cluster Validation Indices
Rand index: measure of agreement between clustering resultsand labels
Rand index =a + d
a + b + c + d
a: number of pairs of nodes with same label and in same clusterb: number of pairs with same label but in different clustersc: number of pairs with different labels but in the same clusterd : number of pairs with different labels and in different clusters
Adjusted Rand index: Rand index corrected for chance (Hubertand Arabie, 1985)Expected adjusted Rand index for random clustering result is 0Label clusters as phishing clusters if ratio of phishers toharvesters > 0.5Use phisher or non-phisher as label for each harvesterLook for agreement between harvester labels and cluster labels
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 37 / 54
![Page 78: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/78.jpg)
Results
Cluster Validation Indices
Rand index: measure of agreement between clustering resultsand labels
Rand index =a + d
a + b + c + d
a: number of pairs of nodes with same label and in same clusterb: number of pairs with same label but in different clustersc: number of pairs with different labels but in the same clusterd : number of pairs with different labels and in different clusters
Adjusted Rand index: Rand index corrected for chance (Hubertand Arabie, 1985)Expected adjusted Rand index for random clustering result is 0Label clusters as phishing clusters if ratio of phishers toharvesters > 0.5Use phisher or non-phisher as label for each harvesterLook for agreement between harvester labels and cluster labels
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 37 / 54
![Page 79: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/79.jpg)
Results
Cluster Validation Indices
Rand index: measure of agreement between clustering resultsand labels
Rand index =a + d
a + b + c + d
a: number of pairs of nodes with same label and in same clusterb: number of pairs with same label but in different clustersc: number of pairs with different labels but in the same clusterd : number of pairs with different labels and in different clusters
Adjusted Rand index: Rand index corrected for chance (Hubertand Arabie, 1985)Expected adjusted Rand index for random clustering result is 0Label clusters as phishing clusters if ratio of phishers toharvesters > 0.5Use phisher or non-phisher as label for each harvesterLook for agreement between harvester labels and cluster labels
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 37 / 54
![Page 80: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/80.jpg)
Results
Cluster Validation Indices
Rand index: measure of agreement between clustering resultsand labels
Rand index =a + d
a + b + c + d
a: number of pairs of nodes with same label and in same clusterb: number of pairs with same label but in different clustersc: number of pairs with different labels but in the same clusterd : number of pairs with different labels and in different clusters
Adjusted Rand index: Rand index corrected for chance (Hubertand Arabie, 1985)Expected adjusted Rand index for random clustering result is 0Label clusters as phishing clusters if ratio of phishers toharvesters > 0.5Use phisher or non-phisher as label for each harvesterLook for agreement between harvester labels and cluster labels
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 37 / 54
![Page 81: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/81.jpg)
Results
Validation Indices For Similarity in Spam Server Usage
Validation indices for similarity in spam server usage results
Year 2006 2007
Month July October January April July
Rand index 0.884 0.936 0.923 0.937 0.880
Adj. Rand index 0.759 0.847 0.803 0.802 0.618
Very high Rand and adjusted Rand indices indicates goodagreement between labels and clustering resultsResults highly unlikely to be caused by chance
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 38 / 54
![Page 82: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/82.jpg)
Results
Top Subject Lines in Phishing Clusters
Cluster 5
Subject line Hits
Password Change Required 126
Question from eBay Member 69
Credit Union Online R© $50 Reward Survey 47
PayPal Account 42
PayPal Account - Suspicious Activity 40
Cluster 9
Subject line Hits
Notification from Billing Department 49
IMPORTANT: Notification of limited accounts 25
PayPal Account Review Department 22
Notification of Limited Account Access 13
A secondary e-mail address has been added to your 11
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 39 / 54
![Page 83: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/83.jpg)
Results
Top Subject Lines in Non-Phishing Clusters
Cluster 2
Subject line Hits
tthemee 6893
St ock 6 6729
Notification 4516
Access granted to send emails to 4495
Thanks for joining 4405
Cluster 4
Subject line Hits
Make Money by Sharing Your Life with Friends and F 1027
Premiere Professional & Executive Registries Invit 750
Texas Land/Golf is the Buzz 459
Keys to Stock Market Success 408
An Entire Case of Fine Wine plus Exclusive Gift fo 367
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 40 / 54
![Page 84: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/84.jpg)
Results
Venn Diagram of Phishers’ Life Times
Venn diagram of phishers’ life times as percentage of total (1805 totalphishers)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 41 / 54
![Page 85: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/85.jpg)
Results
Venn Diagram of Non-Phishers’ Life Times
Venn diagram of non-phishers’ life times as percentage of total (4801total non-phishers)
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 42 / 54
![Page 86: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/86.jpg)
Results
Findings from Similarity in Spam Server Usage
Clustering divides spammers into communities of mostly phishersand mostly non-phishersEmpirical evidence that phishers tend to form small groups andshare resourcesPhishers have shorter life times than non-phishersDiscovered community structure is highly unlikely by chance
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 43 / 54
![Page 87: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/87.jpg)
Results
Findings from Similarity in Spam Server Usage
Clustering divides spammers into communities of mostly phishersand mostly non-phishersEmpirical evidence that phishers tend to form small groups andshare resourcesPhishers have shorter life times than non-phishersDiscovered community structure is highly unlikely by chance
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 43 / 54
![Page 88: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/88.jpg)
Results
Findings from Similarity in Spam Server Usage
Clustering divides spammers into communities of mostly phishersand mostly non-phishersEmpirical evidence that phishers tend to form small groups andshare resourcesPhishers have shorter life times than non-phishersDiscovered community structure is highly unlikely by chance
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 43 / 54
![Page 89: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/89.jpg)
Results
Findings from Similarity in Spam Server Usage
Clustering divides spammers into communities of mostly phishersand mostly non-phishersEmpirical evidence that phishers tend to form small groups andshare resourcesPhishers have shorter life times than non-phishersDiscovered community structure is highly unlikely by chance
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 43 / 54
![Page 90: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/90.jpg)
Results
Similarity in Temporal Spamming
Results from October 2006 using similarity in temporal spamming
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 44 / 54
![Page 91: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/91.jpg)
Results
Temporal Spamming Histograms
0 200 400 600 8000
200
400
Time (in 1−hour bins)
# of
em
ails
0 200 400 600 8000
200
400
0 200 400 600 8000
200
400
0 200 400 600 8000
200
400
0 200 400 600 8000
200
400
0 200 400 600 8000
200
400
0 200 400 600 8000
200
400
0 200 400 600 8000
200
400
0 200 400 600 8000
200
400
0 200 400 600 8000
200
400
Temporal spamming histograms of ten harvesters from same cluster
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 45 / 54
![Page 92: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/92.jpg)
Results
Statistics from Similarity in Temporal Spamming
Average temporal spamming correlation coefficients between twoharvesters in the aforementioned group
Year 2006 2007
Month July October January April July
ρavg 0.979 0.988 0.933 0.950 0.937
IP addresses of all harvesters in this group have 208.66.195/24prefixThese harvesters are among the heaviest spammers in eachmonthWe discovered several other groups with coherent temporalbehavior and similar IP addresses
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 46 / 54
![Page 93: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/93.jpg)
Results
Statistics from Similarity in Temporal Spamming
Average temporal spamming correlation coefficients between twoharvesters in the aforementioned group
Year 2006 2007
Month July October January April July
ρavg 0.979 0.988 0.933 0.950 0.937
IP addresses of all harvesters in this group have 208.66.195/24prefixThese harvesters are among the heaviest spammers in eachmonthWe discovered several other groups with coherent temporalbehavior and similar IP addresses
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 46 / 54
![Page 94: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/94.jpg)
Results
Statistics from Similarity in Temporal Spamming
Average temporal spamming correlation coefficients between twoharvesters in the aforementioned group
Year 2006 2007
Month July October January April July
ρavg 0.979 0.988 0.933 0.950 0.937
IP addresses of all harvesters in this group have 208.66.195/24prefixThese harvesters are among the heaviest spammers in eachmonthWe discovered several other groups with coherent temporalbehavior and similar IP addresses
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 46 / 54
![Page 95: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/95.jpg)
Results
Statistics from Similarity in Temporal Spamming
Average temporal spamming correlation coefficients between twoharvesters in the aforementioned group
Year 2006 2007
Month July October January April July
ρavg 0.979 0.988 0.933 0.950 0.937
IP addresses of all harvesters in this group have 208.66.195/24prefixThese harvesters are among the heaviest spammers in eachmonthWe discovered several other groups with coherent temporalbehavior and similar IP addresses
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 46 / 54
![Page 96: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/96.jpg)
Results
Similarity in Temporal Harvesting
Results from July 2006 using similarity in temporal harvesting
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 47 / 54
![Page 97: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/97.jpg)
Results
Temporal Harvesting Histograms
0 200 400 600 80005
10
Time (in 1−hour bins)
# of
em
ail a
ddre
sses
col
lect
ed
0 200 400 600 80005
10
0 200 400 600 80005
10
0 200 400 600 80005
10
0 200 400 600 80005
10
0 200 400 600 80005
10
0 200 400 600 80005
10
0 200 400 600 80005
10
0 200 400 600 80005
10
0 200 400 600 80005
10
Temporal spamming histograms of 208.66.195/24 group of harvesters
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 48 / 54
![Page 98: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/98.jpg)
Results
Statistics from Similarity in Temporal Harvesting
Average temporal harvesting correlation coefficients between twoharvesters in the 208.66.195/24 group
Year 2006
Month May June July August September
ρavg 0.579 0.645 0.661 0.533 0.635
Correlation is not as high as with temporal spammingLower correlation is expected due to randomness of addressacquisition timesResults still indicate high behavioral correlationAll harvesting was done between May and September 2006
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 49 / 54
![Page 99: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/99.jpg)
Results
Statistics from Similarity in Temporal Harvesting
Average temporal harvesting correlation coefficients between twoharvesters in the 208.66.195/24 group
Year 2006
Month May June July August September
ρavg 0.579 0.645 0.661 0.533 0.635
Correlation is not as high as with temporal spammingLower correlation is expected due to randomness of addressacquisition timesResults still indicate high behavioral correlationAll harvesting was done between May and September 2006
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 49 / 54
![Page 100: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/100.jpg)
Results
Statistics from Similarity in Temporal Harvesting
Average temporal harvesting correlation coefficients between twoharvesters in the 208.66.195/24 group
Year 2006
Month May June July August September
ρavg 0.579 0.645 0.661 0.533 0.635
Correlation is not as high as with temporal spammingLower correlation is expected due to randomness of addressacquisition timesResults still indicate high behavioral correlationAll harvesting was done between May and September 2006
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 49 / 54
![Page 101: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/101.jpg)
Results
Statistics from Similarity in Temporal Harvesting
Average temporal harvesting correlation coefficients between twoharvesters in the 208.66.195/24 group
Year 2006
Month May June July August September
ρavg 0.579 0.645 0.661 0.533 0.635
Correlation is not as high as with temporal spammingLower correlation is expected due to randomness of addressacquisition timesResults still indicate high behavioral correlationAll harvesting was done between May and September 2006
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 49 / 54
![Page 102: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/102.jpg)
Results
Statistics from Similarity in Temporal Harvesting
Average temporal harvesting correlation coefficients between twoharvesters in the 208.66.195/24 group
Year 2006
Month May June July August September
ρavg 0.579 0.645 0.661 0.533 0.635
Correlation is not as high as with temporal spammingLower correlation is expected due to randomness of addressacquisition timesResults still indicate high behavioral correlationAll harvesting was done between May and September 2006
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 49 / 54
![Page 103: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/103.jpg)
Results
Findings from Temporal Similarity
We discover several groups with coherent temporal behavior andsimilar IP addressesIn particular, a group of ten heavy spammers with 208.66.195/24IP address prefix
Indicates that these computers are very close geographicallyEither the same spammer or a group of spammers in same physicallocation
Highly likely that these groups are coordinated
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 50 / 54
![Page 104: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/104.jpg)
Results
Findings from Temporal Similarity
We discover several groups with coherent temporal behavior andsimilar IP addressesIn particular, a group of ten heavy spammers with 208.66.195/24IP address prefix
Indicates that these computers are very close geographicallyEither the same spammer or a group of spammers in same physicallocation
Highly likely that these groups are coordinated
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 50 / 54
![Page 105: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/105.jpg)
Results
Findings from Temporal Similarity
We discover several groups with coherent temporal behavior andsimilar IP addressesIn particular, a group of ten heavy spammers with 208.66.195/24IP address prefix
Indicates that these computers are very close geographicallyEither the same spammer or a group of spammers in same physicallocation
Highly likely that these groups are coordinated
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 50 / 54
![Page 106: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/106.jpg)
Results
Findings from Temporal Similarity
We discover several groups with coherent temporal behavior andsimilar IP addressesIn particular, a group of ten heavy spammers with 208.66.195/24IP address prefix
Indicates that these computers are very close geographicallyEither the same spammer or a group of spammers in same physicallocation
Highly likely that these groups are coordinated
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 50 / 54
![Page 107: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/107.jpg)
Results
Findings from Temporal Similarity
We discover several groups with coherent temporal behavior andsimilar IP addressesIn particular, a group of ten heavy spammers with 208.66.195/24IP address prefix
Indicates that these computers are very close geographicallyEither the same spammer or a group of spammers in same physicallocation
Highly likely that these groups are coordinated
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 50 / 54
![Page 108: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/108.jpg)
Results
Border Gateway Protocol
Border Gateway Protocol (BGP) is the core routing protocol at thehighest level in the InternetRouters on edge of autonomous systems (ASes) send updatesbetween themselves about connectivity within their AS
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 51 / 54
![Page 109: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/109.jpg)
Results
BGP Life Span
BGP life span of a spam server is roughly the amount of time it isconnected to the rest of the InternetIt has been observed that some spam servers have short BGP lifespans, perhaps to remain untraceable (Ramachandran andFeamster, 2006)Are harvesters which use short-lived spam servers tightlyconnected?
Few spam servers have short BGP life spansNo significant correlation found between BGP life span andphishing level
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 52 / 54
![Page 110: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/110.jpg)
Results
BGP Life Span
BGP life span of a spam server is roughly the amount of time it isconnected to the rest of the InternetIt has been observed that some spam servers have short BGP lifespans, perhaps to remain untraceable (Ramachandran andFeamster, 2006)Are harvesters which use short-lived spam servers tightlyconnected?
Few spam servers have short BGP life spansNo significant correlation found between BGP life span andphishing level
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 52 / 54
![Page 111: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/111.jpg)
Results
BGP Life Span
BGP life span of a spam server is roughly the amount of time it isconnected to the rest of the InternetIt has been observed that some spam servers have short BGP lifespans, perhaps to remain untraceable (Ramachandran andFeamster, 2006)Are harvesters which use short-lived spam servers tightlyconnected?
Few spam servers have short BGP life spansNo significant correlation found between BGP life span andphishing level
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 52 / 54
![Page 112: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/112.jpg)
Results
BGP Life Span
BGP life span of a spam server is roughly the amount of time it isconnected to the rest of the InternetIt has been observed that some spam servers have short BGP lifespans, perhaps to remain untraceable (Ramachandran andFeamster, 2006)Are harvesters which use short-lived spam servers tightlyconnected?
Few spam servers have short BGP life spansNo significant correlation found between BGP life span andphishing level
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 52 / 54
![Page 113: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/113.jpg)
Results
BGP Life Span
BGP life span of a spam server is roughly the amount of time it isconnected to the rest of the InternetIt has been observed that some spam servers have short BGP lifespans, perhaps to remain untraceable (Ramachandran andFeamster, 2006)Are harvesters which use short-lived spam servers tightlyconnected?
Few spam servers have short BGP life spansNo significant correlation found between BGP life span andphishing level
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 52 / 54
![Page 114: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/114.jpg)
Summary
Summary
Clustering using similarity in spam server usage revealscommunities of phishers and non-phishers
Phishing is a phenotype: most harvestors are either phishers ornon-phishersPhishers form gangs: they share resources in isolated closely knitcommunities.
Clustering using temporal similarity reveals coordinated groups ofharvestersSpammer social network patterns might be used for detection andinterdiction
Future workStatistical latent variable models for spammer community discoveryClustering based on combinations of similarity measuresEvolutionary models for community behavior
The dual problem: discovery of spam server communities.
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 53 / 54
![Page 115: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/115.jpg)
Summary
Summary
Clustering using similarity in spam server usage revealscommunities of phishers and non-phishers
Phishing is a phenotype: most harvestors are either phishers ornon-phishersPhishers form gangs: they share resources in isolated closely knitcommunities.
Clustering using temporal similarity reveals coordinated groups ofharvestersSpammer social network patterns might be used for detection andinterdiction
Future workStatistical latent variable models for spammer community discoveryClustering based on combinations of similarity measuresEvolutionary models for community behavior
The dual problem: discovery of spam server communities.
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 53 / 54
![Page 116: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/116.jpg)
Summary
Summary
Clustering using similarity in spam server usage revealscommunities of phishers and non-phishers
Phishing is a phenotype: most harvestors are either phishers ornon-phishersPhishers form gangs: they share resources in isolated closely knitcommunities.
Clustering using temporal similarity reveals coordinated groups ofharvestersSpammer social network patterns might be used for detection andinterdiction
Future workStatistical latent variable models for spammer community discoveryClustering based on combinations of similarity measuresEvolutionary models for community behavior
The dual problem: discovery of spam server communities.
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 53 / 54
![Page 117: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/117.jpg)
Summary
Summary
Clustering using similarity in spam server usage revealscommunities of phishers and non-phishers
Phishing is a phenotype: most harvestors are either phishers ornon-phishersPhishers form gangs: they share resources in isolated closely knitcommunities.
Clustering using temporal similarity reveals coordinated groups ofharvestersSpammer social network patterns might be used for detection andinterdiction
Future workStatistical latent variable models for spammer community discoveryClustering based on combinations of similarity measuresEvolutionary models for community behavior
The dual problem: discovery of spam server communities.
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 53 / 54
![Page 118: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/118.jpg)
Summary
Summary
Clustering using similarity in spam server usage revealscommunities of phishers and non-phishers
Phishing is a phenotype: most harvestors are either phishers ornon-phishersPhishers form gangs: they share resources in isolated closely knitcommunities.
Clustering using temporal similarity reveals coordinated groups ofharvestersSpammer social network patterns might be used for detection andinterdiction
Future workStatistical latent variable models for spammer community discoveryClustering based on combinations of similarity measuresEvolutionary models for community behavior
The dual problem: discovery of spam server communities.
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 53 / 54
![Page 119: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/119.jpg)
Summary
Summary
Clustering using similarity in spam server usage revealscommunities of phishers and non-phishers
Phishing is a phenotype: most harvestors are either phishers ornon-phishersPhishers form gangs: they share resources in isolated closely knitcommunities.
Clustering using temporal similarity reveals coordinated groups ofharvestersSpammer social network patterns might be used for detection andinterdiction
Future workStatistical latent variable models for spammer community discoveryClustering based on combinations of similarity measuresEvolutionary models for community behavior
The dual problem: discovery of spam server communities.
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 53 / 54
![Page 120: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/120.jpg)
Summary
Summary
Clustering using similarity in spam server usage revealscommunities of phishers and non-phishers
Phishing is a phenotype: most harvestors are either phishers ornon-phishersPhishers form gangs: they share resources in isolated closely knitcommunities.
Clustering using temporal similarity reveals coordinated groups ofharvestersSpammer social network patterns might be used for detection andinterdiction
Future workStatistical latent variable models for spammer community discoveryClustering based on combinations of similarity measuresEvolutionary models for community behavior
The dual problem: discovery of spam server communities.
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 53 / 54
![Page 121: Social Networks of Spammershero/Preprints/Social...Total emails received at Project Honey Pot trap addresses by month Outbreak of spam observed in October 2006 consistent with media](https://reader034.vdocuments.us/reader034/viewer/2022042117/5e95906a15db0a427c021c30/html5/thumbnails/121.jpg)
References
References
1 M. Girvan and M. E. J. Newman, “Community Structure in Socialand Biological Networks,” National Academy of Sciences (2002).
2 L. Hubert and P. Arabie, “Comparing Partitions,” Journal ofClassification (1985).
3 J. Moody, “Race, School Integration, and Friendship Segmentationin America,” American Journal of Sociology (2001).
4 M. Prince et al., “Understanding How Spammers Steal Your E-MailAddress: An Analysis of the First Six Months of Data from ProjectHoney Pot,” 2nd Conference on Email and Anti-Spam (2005).
5 U. von Luxburg, “A Tutorial on Spectral Clustering,” Statistics andComputing, (2007).
6 S. Yu and J. Shi, “Multiclass Spectral Clustering,” 9th IEEEInternational Conference on Computer Vision (2003).
A. Hero (University of Michigan) Social Networks of Spammers September 22, 2008 54 / 54