anomaly-based spam filtering - secrypt 2011

Post on 14-Apr-2017

371 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Carlos Laorden

WHAT YOU GOT, THEN? SPAM, EGG,

SPAM, SPAM, BACON AND

SPAM.

SPAM, SPAM, SPAM, BAKED BEANS AND

SPAM.

ANYTHING WITHOUT

SPAM?

I DON’T LIKE SPAM!!

UGH!

Meet the real SPiced hAM

Monty Python’s Flying Circus

Something that repeats and repeats until being annoying

It is a

real problem for Information Security

Billions of daily losses in

productivity

Infected computers

Stolen credentials

We must

fight

Anti-spam methods

Pre-sending

New

protocols

Post-sending

Increase sending

costs Increase risks

for spammers

E-mail

sender

E-mail

content

E-mail

content

Usually

supervised approaches

A significant

labelling work is needed

A significant

labelling work is needed

But, is this

possible?

I mean, is this

possible...

YES

Anomaly Detection

no interest this SpamAssassin word has

this has Ling Spam no interest word

SpamAssassin

Ling Spam t1

t2

t3 D1

D2

D10 D3

D9

D4

D7

D8

D5

D11

D6

? ?

Anomaly detection

d

d > threshold?

> threshold?

Manhattan distance

Euclidean distance

Anomaly detection

?

d

d ?

Minimum distance

Maximum distance

Mean distance

Minimum

distance

Maximum

distance

Mean

distance

Manhattan

distance

Euclidean

distance

10 different

thresholds

Anomaly detection

d

d < threshold

> threshold

Minimum

distance

Maximum

distance

Mean

distance

Manhattan

distance

Euclidean

distance

10

thresholds

Results

SpamAssassin Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 91.03% 92.85% 91.93% 76.14% 97.77% 85.61%

Maximum 69.61% 99.89% 82.05% 72.99% 97.66% 83.54%

Minimum 95.40% 93.86% 94.62% 92.10% 94.00% 93.04%

SpamAssassin Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 91.03% 92.85% 91.93% 76.14% 97.77% 85.61%

Maximum 69.61% 99.89% 82.05% 72.99% 97.66% 83.54%

Minimum 95.40% 93.86% 94.62% 92.10% 94.00% 93.04%

Ling Spam Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 79.18% 73.54% 76.26% 92.82% 91.58% 92.20%

Maximum 76.23% 74.29% 75.25% 85.95% 79.29% 82.49%

Minimum 65.82% 74.38% 69.84% 87.51% 93.13% 90.23%

Ling Spam Manhattan Euclidean

Prec. Rec. F-Meas. Prec. Rec. F-Meas.

Mean 79.18% 73.54% 76.26% 92.82% 91.58% 92.20%

Maximum 76.23% 74.29% 75.25% 85.95% 79.29% 82.49%

Minimum 65.82% 74.38% 69.84% 87.51% 93.13% 90.23%

Suitable to

overcome the amount

of unclassified spam e-mails

Will we see

the END of spam?

95%

“Solution to spam”

Cut their billing systems?

References

1. Monty Python – Spam: http://www.youtube.com/watch?v=anwy2MPT5RE

2. Spam wall by freezelight: http://www.flickr.com/photos/63056612@N00/155554663/

3. monty python flying circus by the_d8_show: http://www.flickr.com/photos/8056839@N04/478599790/

4. Dollars: http://vegasgravy.com/News-detail/two-women-

caught-for-transporting-drug-money-from-vegas/dollars/

5. Day 97: Infected by dustywrath: http://www.flickr.com/photos/10921499@N07/2187318683

6. my bank sucks by B Rosen: http://www.flickr.com/photos/rosengrant/3537904106/

7. Feet on table: http://bisystembuilders.com/wp-

content/uploads/2010/02/shutterstock_feet-on-table.jpg

8. Buried on bills: http://getupkids.net/wp-

content/uploads/2013/06/debt_piling.jpg

9. Kill spam: http://www.email-marketing-wizard.com/wp-

content/uploads/2010/03/spammer.jpg

top related