1 authors: anirudh ramachandran, nick feamster, and santosh vempala publication: acm conference on...
TRANSCRIPT
1
Authors: Anirudh Ramachandran, Nick Feamster, and Santosh Vempala
Publication: ACM Conference on Computer and Communications Security 2007
Presenter: Melvin Rodriguez for CAP 6133, Spring’08
Filtering Spam with
Behavioral Blacklisting
2
What is Spam / Spamming– Indiscriminately send of unsolicited bulk
messages
- Different types of Spam
Why it is used?– Reach of potential customers / market– No or little operating cost for senders
E-mail Spam - unsolicited bulk email
Filtering Spam with Behavioral Blacklisting
Source: Wikipedia -http://en.wikipedia.org/wiki/Spam_%28Monty_Python%29
3
The problem with Spam– Users waste time and resources– Users received multiple unwanted emails
Promoting events Advertising new products sales Promoting services
– Spammers use more sophisticated techniques– More resources are needed
Increase capacity in servers storage and bandwidth Increase time to manage items
Spam uses needed resources and increase costs
Filtering Spam with Behavioral Blacklisting
Source: Wurd -http://www.wurd.com/cl_email_faq_spam.php
4
A 2007 study by Osterman Research Inc.
- A growing proportion of spam is generated by zombies
that are part of enormous botnets of infected computers.
- Symantec reported in March 2007 that it had discovered
more than six million zombies worldwide.
- More than 80% of spam is today generated by zombies
- Spam campaigns are constantly changing strategies
Filtering Spam with Behavioral Blacklisting
Source: Osterman Research Inc - http://www.ostermanresearch.com/whitepapers/or_sym0607.pdf
Spam – very hard to control
5
How to Solve the problem– Spam Filters
Blacklisting – Publicizing known IP addresses that send spam– Issues
Need to know what to block / filter Need to constant update Need to adapt to spam campaign changes
Behavior Blacklisting – Spam Tracker– Classifies senders based on their sending behavior rather than IP
identity– Similar patterns of spammers sending behavior “fingerprint”
Behavior Spam Filter – based on sending behavior
Filtering Spam with Behavioral Blacklisting
6
Behavior Blacklisting – Spam Tracker
Cluster emails based on targeted domains Builds “blacklist clusters” based on known spammers Tracks sending patterns from other senders Uses fast spectral clustering algorithms
– Two phases: Clustering (spectral) - similar behavior in their target domain
- gather initial data and create clusters Classification – assign a value and compare
- obtain sending patterns from servers- compares algorithm value to known pattern
Filtering Spam with Behavioral Blacklisting
7
Filtering Spam with Behavioral Blacklisting
Spam Tracker – High Level Design
8
Conclusion– New spam detecting technique using “behavioral
blacklisting”– Classifies email based on senders sending
patters– Creation of email “blacklist clusters”
Filtering Spam with Behavioral Blacklisting
9
Contributions– Improvements on detecting email spam– Using new algorithms to detect and classify email
spam– Capable of detecting new email spammers
senders earlier than existing processes
Filtering Spam with Behavioral Blacklisting
10
Weaknesses– Dependent on the number of data sources
Limited number of data collection points
– Limited testing pool of domains– Process sequence is not clearly depicted– Lack of integration with existing spam systems
Filtering Spam with Behavioral Blacklisting
11
How to Improve– More testing needed for analysis of false positives– Increase the number of data collection points– Add additional features to algorithms– Add integration capabilities with existing email
spam services– Add missing diagrams discussed in paper– Present process sequence in more detail
Filtering Spam with Behavioral Blacklisting
12
Back-Up Slides
Filtering Spam with Behavioral Blacklisting
13
Origins of the use of the word SPAM– "Spam" is a popular Monty Python sketch, first
televised in 1970. In the sketch, two customers are trying to order a breakfast from a menu that includes the processed meat product in almost every dish. The term spam (in electronic communication, and as of 2007, general slang) is derived from this sketch.
Filtering Spam with Behavioral Blacklisting
Source: Wikipedia -http://en.wikipedia.org/wiki/Spam_%28Monty_Python%29
14
The list of 2007 top 12 countries that spread spam around the globe:– USA - 28.4%; – South Korea - 5.2%; – China (including Hong Kong) - 4.9%; – Russia - 4.4%; – Brazil - 3.7%; – France - 3.6%; – Germany - 3.4%; – Turkey - 3.%; – Poland - 2.7%; – Great Britain - 2.4%; – Romania - 2.3%; – Mexico - 1.9%; – Other countries - 33.9% [8]
Filtering Spam with Behavioral Blacklisting
Source: Wikipedia -http://en.wikipedia.org/wiki/Spam_%28Monty_Python%29
15
Trace Date Range Fields Organization Mar. 1 – 31, 2007 Received time, remote IP, targeted
domain, whether rejected Blacklist Apr. 1 – 30, 2007 IP address (or range), time of listing Data sets used in evaluation. Our primary data is a set of email logs from a provider (“Organization”)
that hosts and manages mail servers for over 115 domains. The trace also contains an indication of whether it rejected the SMTP connection or not. We also use the full database of Spamhaus [37] for one month, including all additions that happened within the month (“Blacklist”), to help us evaluate the performance of SpamTracker relative to existing blacklists. We choose the Blacklist traces for the time period immediately after the email traces end so that we can discover the first time an IP address, unlisted at the time email from it observed in the Organization trace was added to Blacklist trace.
Filtering Spam with Behavioral Blacklisting