early detection of outgoing spammers in large-scale service provider networks

26
Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Upload: sulwyn

Post on 20-Mar-2016

40 views

Category:

Documents


1 download

DESCRIPTION

Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks. Yehonatan Cohen Daniel Gordon Danny Hendler. Ben-Gurion University. Talk outline. Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam Evaluation Conclusions and Future Work. Preliminaries. Spam - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Yehonatan CohenDaniel GordonDanny Hendler

Ben-Gurion University

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Page 2: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam Evaluation Conclusions and Future Work

Danny Hendler and Philipp Woelfel, PODC 2009

Talk outline

Page 3: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Preliminaries Spam

Unsolicited mail, typically sent in large quantities

Hazards•Malware distribution•Phishing•Resource consumption•Poor user experience

Detection may be attempted when•Mail is sent (outgoing spam detection)•Mail is received (incoming spam detection)

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Page 4: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Outgoing spam detection

Spam can be blocked before leaving the Email Service Provider (ESP)

Advantages• Reduces load on ESP infrastructure• Prevents damage to ESP reputation• Detection may be based on hosted accounts' activity

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Page 5: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Outgoing spam filtering techniques

Contents-based filtering: Learn & identify messages' textual patterns typical of spam messages

•May be tricked by manipulating spam contento Image-basedo Random string insertion (hash busters)

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Non-negligible false negative rate

Page 6: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Outgoing spam filtering techniques (cont'd)

Inter-account communication patterns analysis:•Models accounts' behaviour•Based on inter-account social interactions•Typically utilizes machine-learning techniques•May leverage ESP account identification

Page 7: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Devise an effective detector of outgoing spammers for large ESPs (the ErDOS detector)

Emphasis on early detection•Detects spammers before the contents-based filter

Short training periods•Highly adaptive to changing spamming patterns

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Our goals

Page 8: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Most relevant related work Lam & Yeung, CEAS 2007

• Introduce “social-network”-based outgoing spam detection• Use the k-NN classifier• Relatively small dataset (ENRON)• Labeling based on simulated spammer accounts

Tseng & Chen, CSE 2009• Uses same set of features• Uses SVM classifier• Larger, non-ESP dataset (University email server)• Incremental model update• Labeling based on pure accounts• Account identification based on “from” header field

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Page 9: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Comparison with data-sets of previous work

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Our data set NTU Enron

#mails 9.86E7 2.13E8 2.86E6 5.17E5

#accounts 5.63E7 5.81E7 6.37E5 3.67E4

#edges 7.40E7 12.90E7 - 3.68E5

time period 4 days(in/out)

26 days(outgoing) 10 days 3.5 years

contents spam & ham spam & ham ham

Collected by a very large ESP Consists of incoming and outgoing log files

o 4 days of bi-directional data + 22 days of outgoing traffic only Both incoming and outgoing messages are labeled as spam/ham by

a content-based detector

Page 10: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Comparison with data-sets of previous work

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Our data set NTU Enron

#mails 9.86E7 2.13E8 2.86E6 5.17E5

#accounts 5.63E7 5.81E7 6.37E5 3.67E4

#edges 7.40E7 12.90E7 - 3.68E5

time period 4 days(in/out)

26 days(outgoing) 10 days 3.5 years

contents spam & ham spam & ham ham

Collected by a very large ESP Consists of incoming and outgoing log files

o 4 days of bi-directional data + 22 days of outgoing traffic only Both incoming and outgoing messages are labeled as spam/ham by

a content-based detector

Page 11: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Danny Hendler and Philipp Woelfel, PODC 2009

Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam

• Computation Flow• Features

Evaluation Conclusions and Future Work

Talk outline

Page 12: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

The ErDOS detector: computation flow

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Scored accounts

Classifieddata set

Classification model

Undersampling: extract all spammers and equal number of legitimate accounts

as training setTraining set

Remainder of accounts not in training set

Determine accounts'

classification

Compute account feature values

based on a single day of email logs

Build rotation

forest model

Assign account scores using classification

model

Construct suspect

accounts list of configurable

size

Pre-processing

Feature values

computed

Page 13: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam

• Computation Flow• Features

Evaluation Conclusions and Future Work

Talk outline

Page 14: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Legitimate users Maintain social

interactions Often belong to

mailing lists

Spammers Sent messages

seldom replied

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

An account’s IOR = #incoming/#outgoing mails

Low IOR characteristic of spammers

ErDOS features: IOR

Page 15: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Danny Hendler and Philipp Woelfel, PODC 2009

ErDOS features: IOR (cont'd)

Page 16: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Communication Reciprocity (CR)• Fraction of recipients who responded to an account's emails• Defined by Gomes et al.• IOR is superior for short training periods

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

ErDOS features: IOR versus CR

Page 17: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

IEBC (Internal/External Behaviour Consistency)• An account can send/receive emails to/from

Internal addresses (accounts hosted by ESP) External addresses

• Legitimate accounts show correlation between internal and external IOR, spammers less so

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

ErDOS features: IEBC

Page 18: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

ErDOS features: #outgoing messages Number of outgoing messages

• Spamming accounts send more emails than legitimate• Insufficient for detecting low-volume spammers

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Page 19: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

A large fraction of spammers' incoming mail is spam!• Legitimate accounts seldom send emails to spamming

accounts• Dictionary attacks may cause spammers to spam each other

Analyse senders' characteristics

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

ErDOS: Sender Accounts' Characteristics

Page 20: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam Evaluation Conclusions and Future Work

Talk outline

Page 21: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Accuracy for Single-Day training Evaluate Accuracy attained for single day logs

• Email accounts are classified based on the tags of the contents-base detector

• True Positive (TP) and False Positive (FP) values are averaged over available 4 days of bidirectional data

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

ErDOS LY-knn� ⃰ MailNET� ⃰� ⃰ ⃰ ⃰TP FP TP FP TP FP71 8.9 76.3 47.8 22.6 44.2

Page 22: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Early detection evaluation Spamming accounts detected before the

contents-based detector• Suspected by detector, send messages tagged as spam

only on later days• Evaluation uses all 26 days of data

Early detection quality criteria:• e-Precision: fraction of early detected accounts out of

suspects list.• Enrichment Factor (EF): ratio between detector's

e-Precision and that of a random accounts list.

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Page 23: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Early detection Early detection results, averaged over 4 days:

Prior art’s early detections results compared to ErDOS:

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

ErDOS’s suspects Entire population#accounts 100 100

Early detections 9 0.53

e-Precision 0.09 0.0053

ErDOS LY-knn MailNETe-Precision 90.0 0.012 0.025

EF 16.9 2.3 4.7

Page 24: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Early detection (cont’d) e-Precision for varying suspects list lengths:

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Page 25: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

Preliminaries ErDOS: An Early Detection Scheme for Outgoing Spam Evaluation Conclusions and Future Work

Talk outline

Page 26: Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks

Conclusions and Future Work Conclusions

• The case of outgoing spam detection for ESPs has its unique nature

• Contents-based filtering is not enough• Early detection of spamming accounts can be achieve by a

combination of contents-based filter and network level-based detector

Future Work• Enhancement of ErDOS’s early detection performance by

additional features• A low-volume spammers expert detector, based on

ErDOS’s computation flow and features

Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013