network-level spam and scam defenses
DESCRIPTION
Network-Level Spam and Scam Defenses. Nick Feamster Georgia Tech. with Anirudh Ramachandran, Shuang Hao, Maria Konte Alex Gray, Jaeyeon Jung, Santosh Vempala. Spam: More than Just a Nuisance. 95% of all email traffic Image and PDF Spam (PDF spam ~12%) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/1.jpg)
Network-Level Spam and Scam Defenses
Nick FeamsterGeorgia Tech
with Anirudh Ramachandran, Shuang Hao, Maria KonteAlex Gray, Jaeyeon Jung, Santosh Vempala
![Page 2: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/2.jpg)
2
Spam: More than Just a Nuisance
• 95% of all email traffic– Image and PDF Spam
(PDF spam ~12%)
• As of August 2007, one in every 87 emails constituted a phishing attack
• Targeted attacks on the rise– 20k-30k unique phishing attacks per month
Source: CNET (January 2008), APWG
![Page 3: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/3.jpg)
3
Approach: Filter
• Prevent unwanted traffic from reaching a user’s inbox by distinguishing spam from ham
• Question: What features best differentiate spam from legitimate mail?– Content-based filtering: What is in the mail?– IP address of sender: Who is the sender?– Behavioral features: How the mail is sent?
![Page 4: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/4.jpg)
Content Filters: Chasing a Moving Target
...and even mp3s!
PDFs Excel sheets Images
![Page 5: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/5.jpg)
5
Problems with Content Filtering
• Customized emails are easy to generate: Content-based filters need fuzzy hashes over content, etc.
• Low cost to evasion: Spammers can easily alter features of an email’s content can be easily adjusted and changed
• High cost to filter maintainers: Filters must be continually updated as content-changing techniques become more sophisticated
![Page 6: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/6.jpg)
6
Another Approach: IP Addresses
• Problem: IP addresses are ephemeral
• Every day, 10% of senders are from previously unseen IP addresses
• Possible causes– Dynamic addressing– New infections
![Page 7: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/7.jpg)
7
Our Idea: Network-Based Filtering
• Filter email based on how it is sent, in addition to simply what is sent.
• Network-level properties are less malleable– Network/geographic location of sender and receiver– Set of target recipients– Hosting or upstream ISP (AS number)– Membership in a botnet (spammer, hosting
infrastructure)
![Page 8: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/8.jpg)
8
Why Network-Level Features?
• Lightweight: Don’t require inspecting details of packet streams– Can be done at high speeds– Can be done in the middle of the network
• Robust: Perhaps more difficult to change some network-level features than message contents
![Page 9: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/9.jpg)
9
Challenges• Understanding network-level behavior
– What network-level behaviors do spammers have?– How well do existing techniques work?
• Building classifiers using network-level features– Key challenge: Which features to use?– Two Algorithms: SNARE and SpamTracker
• Building the system – Dynamism: Behavior itself can change– Scale: Lots of email messages (and spam!) out there
• Applications to phishing and scams
![Page 10: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/10.jpg)
10
Data: Spam and BGP• Spam Traps: Domains that receive only spam• BGP Monitors: Watch network-level reachability
Domain 1
Domain 2
17-Month Study: August 2004 to December 2005
![Page 11: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/11.jpg)
11
Finding: BGP “Spectrum Agility”• Hijack IP address space using BGP• Send spam• Withdraw IP address
A small club of persistent players appears to be using
this technique.
Common short-lived prefixes and ASes
61.0.0.0/8 4678 66.0.0.0/8 2156282.0.0.0/8 8717
~ 10 minutes
Somewhere between 1-10% of all spam (some clearly intentional,
others might be flapping)
![Page 12: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/12.jpg)
12
Spectrum Agility: Big Prefixes?
• Flexibility: Client IPs can be scattered throughout dark space within a large /8– Same sender usually returns with different IP
addresses
• Visibility: Route typically won’t be filtered (nice and short)
![Page 13: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/13.jpg)
13
How Well do IP Blacklists Work?
• Completeness: The fraction of spamming IP addresses that are listed in the blacklist
• Responsiveness: The time for the blacklist to list the IP address after the first occurrence of spam
![Page 14: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/14.jpg)
14
Completeness and Responsiveness
• 10-35% of spam is unlisted at the time of receipt• 8.5-20% of these IP addresses remain unlisted
even after one month
Data: Trap data from March 2007, Spamhaus from March and April 2007
![Page 15: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/15.jpg)
15
Problems with IP Blacklists
• IP addresses of senders have considerable churn
• Based on ephemeral identifier (IP address)– More than 10% of all spam comes from IP addresses not seen
within the past two months• Dynamic renumbering of IP addresses• Stealing of IP addresses and IP address space• Compromised machines
• Often require a human to notice/validate the behavior– Spamming is compartmentalized by domain and not analyzed
across domains
![Page 16: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/16.jpg)
16
Outline
• Understanding the network-level behavior– What behaviors do spammers have?– How well do existing techniques work?
• Classifiers using network-level features– Key challenge: Which features to use?– Two algorithms: SNARE and SpamTracker
• System: SpamSpotter – Dynamism: Behavior itself can change– Scale: Lots of email messages (and spam!) out there
• Application to phishing and scams
![Page 17: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/17.jpg)
17
Finding the Right Features
• Goal: Sender reputation from a single packet?– Low overhead– Fast classification– In-network– Perhaps more evasion resistant
• Key challenge– What features satisfy these properties and can
distinguish spammers from legitimate senders?
![Page 18: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/18.jpg)
18
Set of Network-Level Features
• Single-Packet– Geodesic distance– Distance to k nearest senders– Time of day– AS of sender’s IP– Status of email service ports
• Single-Message– Number of recipients– Length of message
• Aggregate (Multiple Message/Recipient)
![Page 19: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/19.jpg)
19
Sender-Receiver Geodesic Distance
90% of legitimate messages travel 2,200 miles or less
![Page 20: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/20.jpg)
20
Density of Senders in IP Space
For spammers, k nearest senders are much closer in IP space
![Page 21: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/21.jpg)
21
Local Time of Day at Sender
Spammers “peak” at different local times of day
![Page 22: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/22.jpg)
22
Combining Features: RuleFit• Put features into the RuleFit classifier• 10-fold cross validation on one day of query logs
from a large spam filtering appliance provider
• Comparable performance to SpamHaus– Incorporating into the system can further reduce FPs
• Using only network-level features• Completely automated
![Page 23: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/23.jpg)
23
Benefits of Whitelisting
Whitelisting top 50 ASes:False positives reduced to 0.14%
![Page 24: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/24.jpg)
24
Outline
• Understanding the network-level behavior– What behaviors do spammers have?– How well do existing techniques work?
• Building classifiers using network-level features– Key challenge: Which features to use?– Algorithms: SpamTracker and SNARE
• System (SpamSpotter)– Dynamism: Behavior itself can change– Scale: Lots of email messages (and spam!) out there
![Page 25: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/25.jpg)
25
Deployment: Real-Time Blacklist
• As mail arrives, lookups received at BL
• Queries provide proxy for sending behavior
• Train based on received data
• Return score
Approach
![Page 26: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/26.jpg)
26
Design Choice: Augment DNSBL• Expressive queries
– SpamHaus: $ dig 55.102.90.62.zen.spamhaus.org
• Ans: 127.0.0.3 (=> listed in exploits block list)– SpamSpotter: $ dig \
receiver_ip.receiver_domain.sender_ip.rbl.gtnoise.net
• e.g., dig 120.1.2.3.gmail.com.-.1.1.207.130.rbl.gtnoise.net
• Ans: 127.1.3.97 (SpamSpotter score = -3.97)
• Also a source of data– Unsupervised algorithms work with unlabeled
data
![Page 27: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/27.jpg)
27
Challenges
• Scalability: How to collect and aggregate data, and form the signatures without imposing too much overhead?
• Dynamism: When to retrain the classifier, given that sender behavior changes?
• Reliability: How should the system be replicated to better defend against attack or failure?
• Evasion resistance: Can the system still detect spammers when they are actively trying to evade?
![Page 28: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/28.jpg)
28
Latency
Performance overhead is negligible.
![Page 29: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/29.jpg)
29
Sampling
Relatively small samples can achieve low false positive rates
![Page 30: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/30.jpg)
30
Possible Improvements
• Accuracy– Synthesizing multiple classifiers– Incorporating user feedback– Learning algorithms with bounded false positives
• Performance– Caching/Sharing– Streaming
• Security– Learning in adversarial environments
![Page 31: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/31.jpg)
31
Spam Filtering: Summary
• Spam increasing, spammers becoming agile– Content filters are falling behind– IP-Based blacklists are evadable
• Up to 30% of spam not listed in common blacklists at receipt. ~20% remains unlisted after a month
• Complementary approach: behavioral blacklisting based on network-level features– Key idea: Blacklist based on how messages are sent– SNARE: Automated sender reputation
• ~90% accuracy of existing with lightweight features– SpamSpotter: Putting it together in an RBL system– SpamTracker: Spectral clustering
• catches significant amounts faster than existing blacklists
![Page 32: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/32.jpg)
32
Phishing and Scams
• Scammers host Web sites on dynamic scam hosting infrastructure– Use DNS to redirect users to different sites
when the location of the sites move
• State of the art: Blacklist URL
• Our approach: Blacklist based on network-level fingerprints
Konte et al., “Dynamics of Online Scam Hosting Infrastructure”, PAM 2009
![Page 33: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/33.jpg)
33
Online Scams
• Often advertised in spam messages• URLs point to various point-of-sale sites• These scams continue to be a menace
– As of August 2007, one in every 87 emails constituted a phishing attack
• Scams often hosted on bullet-proof domains
• Problem: Study the dynamics of online scams, as seen at a large spam sinkhole
![Page 34: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/34.jpg)
34
Online Scam Hosting is Dynamic
• The sites pointed to by a URL that is received in an email message may point to different sites
• Maintains agility as sites are shut down, blacklisted, etc.
• One mechanism for hosting sites: fast flux
![Page 35: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/35.jpg)
35
Mechanism for Dynamics: “Fast Flux”
Source: HoneyNet Project
![Page 36: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/36.jpg)
36
Summary of Findings
• What are the rates and extents of change?– Different from legitimate load balance– Different cross different scam campaigns
• How are dynamics implemented?– Many scam campaigns change DNS mappings at all
three locations in the DNS hierarchy• A, NS, IP address of NS record
• Conclusion: Might be able to detect based on monitoring the dynamic behavior of URLs
![Page 37: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/37.jpg)
37
Data Collection Method
• Three months of spamtrap data– 384 scam hosting domains– 21 unique scam campaigns
• Baseline comparison: Alexa “top 500” Web sites
![Page 38: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/38.jpg)
38
Top 3 Spam Campaigns
• Some campaigns hosted by thousands of IPs• Most scam domains exhibit some type of flux• Sharing of IP addresses across different roles
(authoritative NS and scam hosting)
![Page 39: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/39.jpg)
39
Rates of Change
• How (and how quickly) do DNS-record mappings change?
• Rates of change are much faster than for legitimate load-balanced sites.– Scam domains change on shorter intervals than their
TTL values.
• Domains for different scam campaigns exhibit different rates of change.
![Page 40: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/40.jpg)
40
Rates of Change
• Domains that exhibit fast flux change more rapidly than legitimate domains
• Rates of change are inconsistent with actual TTL values
Rates of change are much faster than for legitimate load-balanced sites.
![Page 41: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/41.jpg)
41
Time Between Record ChangesFast-flux Domains tend to change much more frequently than legitimately hosted sites
![Page 42: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/42.jpg)
42
Rates of Change by CampaignDomains for different scam campaigns exhibit different
rates of change.
![Page 43: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/43.jpg)
43
Rates of Accumulation
• How quickly do scams accumulate new IP addresses?
• Rates of accumulation differ across campaigns• Some scams only begin accumulating IP
addresses after some time
![Page 44: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/44.jpg)
44
Rates of Accumulation
![Page 45: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/45.jpg)
45
Location
• Where in IP address space do hosts for scam sites operate?
• Scam networks use a different portion of the IP address space than legitimate sites– 30/8 – 60/8 --- lots of legitimate sites, no scam sites
• Sites that host scam domains (both sites and authoritative DNS) are more widely distributed than those for legitimate sites
![Page 46: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/46.jpg)
46
Location: Many Distinct SubnetsScam sites appear in many more distinct networks
than legitimate load-balanced sites.
![Page 47: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/47.jpg)
47
Conclusion• Scam campaigns rely on a dynamic hosting
infrastructure• Studying the dynamics of that infrastructure may
help us develop better detection methods
• Dynamics– Rates of change differ from legitimate sites, and differ
across campaigns– Dynamics implemented at all levels of DNS hierarchy
• Location– Scam sites distributed across distinct subnets
Data: http://www.gtnoise.net/scam/fast-flux.html TR: http://www.cc.gatech.edu/research/reports/GT-CS-08-07.pdf
![Page 48: Network-Level Spam and Scam Defenses](https://reader035.vdocuments.us/reader035/viewer/2022062517/56813e98550346895da8e487/html5/thumbnails/48.jpg)
48
References• Anirudh Ramachandran and Nick Feamster, “Understanding
the Network-Level Behavior of Spammers”, ACM SIGCOMM, August 2006
• Anirudh Ramachandran, Nick Feamster, and Santosh Vempala, “Filtering Spam with Behavioral Blacklisting”, ACM CCS, November 2007
• Shuang Hao, Nick Feamster, Alex Gray and Sven Krasser, “SNARE: Spatio-temporal Network-level Automatic Reputation Engine”, USENIX Security, August 2009
• Maria Konte, Nick Feamster, Jaeyeon Jung, “Dynamics of Online Scam Hosting Infrastructure”, Passive and Active Measurement, April 2009