do humans beat computers at pattern recognition

27
Do humans beat computers at pattern recognition? Andra Miloiu Costina Spam Analyst

Upload: bitdefender

Post on 18-Nov-2014

935 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Do Humans Beat Computers At Pattern Recognition

Do humans beat computers at pattern recognition?

Andra Miloiu Costina

Spam Analyst

Page 2: Do Humans Beat Computers At Pattern Recognition

Do humans beat computers at pattern recognition?

NO YES

What do you think?

Page 3: Do Humans Beat Computers At Pattern Recognition

What is the correct answer?

Page 4: Do Humans Beat Computers At Pattern Recognition

NO!

Page 5: Do Humans Beat Computers At Pattern Recognition

NO!

Page 6: Do Humans Beat Computers At Pattern Recognition

Each time we answered “NO” one of the following automated signatures mechanism was designed:

- Patterns extraction;

- Lines detection;

- Cluster base rules generation;

- Automated signatures creation;

NO!

Page 7: Do Humans Beat Computers At Pattern Recognition

Why aren’t we all on a beach?

Page 8: Do Humans Beat Computers At Pattern Recognition

PATTERN EXTRACTION

-Short description:Thus the mechanism is conceptually divided into four steps: one that finds groups of similar emails – layout based filtering, a second that extracts information for each group – a pattern discovery algorithm, a third that determines the utility of each extracted feature – a version of the Relief algorithm, and finally one that fits the pieces together, creating the signatures – a genetic algorithm.

- Pattern extraction mechanism like Teiresias and basic suffix tree

- Pro & cons: +It was among the first methods of automated pattern extraction that we designed. –It was difficult to use and an analyst would have finished the signature a lot faster;

-Stats: It brought an increase in our detection rate of 2%.

Page 9: Do Humans Beat Computers At Pattern Recognition

What did we do next?

…LINES DETECTION

Page 10: Do Humans Beat Computers At Pattern Recognition

LINES DETECTION(1)

-How did spam looked at that time?

Almost a year and a half ago, spam waves took a new turn. The number of lines in a spam message decreased to 1 or 2 spammy lines and one URL.

Page 11: Do Humans Beat Computers At Pattern Recognition

LINES DETECTION(2)

This type of waves came in such big numbers that it affected our response time, therefore we thought of implementing a system which would sign these spams in a shorter period of time.

Page 12: Do Humans Beat Computers At Pattern Recognition

LINES DETECTION

-Short description:

Basically the mechanism worked in three steps:1.Extracting the pattern represented by a relevant text line;2.Each line was associated with the number of apparitions and the it was sorted descending;3.Automated signatures ware created for the top relevant lines.

- Pattern extraction mechanism:

Based on a predefined set of key words, the program would extract the lines containing relevant information;

Page 13: Do Humans Beat Computers At Pattern Recognition

LINES DETECTION

For instance:

Page 14: Do Humans Beat Computers At Pattern Recognition

LINES DETECTION

-While in use, this system increased our response time by 6.4% and helped us sign a series of spam waves which otherwise would have taken an analyst much more time to handle.

-The C.O.D. was mainly the decreasing number of spam waves bearing the same relevant phrases in more than 40% of the cases. The different statements used to express the same point : “Buy Replica Watches”, made us change the perspective on how to create lasting signatures.

Page 15: Do Humans Beat Computers At Pattern Recognition

RIGHT NOW…

CLUSTER BASED RULES GENERATION

&

AUTOMATED SIGNATURES CREATION

Page 16: Do Humans Beat Computers At Pattern Recognition

CLUSTER BASE RULES GENERATION

-Short description:1.Mails are clustered;2.The clusters are seen by an analyst;3. The analyst adds a simple content related pattern and creates the signature;

- Pattern extraction mechanismIn comparison with the previously described system which was entirely based on the content of a spam message, the cluster base rules rely on patterns belonging to the email’s template, such as: the body summary, the date format, the number of URL or the number of separators found in the subject.

Page 17: Do Humans Beat Computers At Pattern Recognition

CLUSTER BASE RULES GENERATION

- Pro & ConsThe great advantage given by this system is it’s universal appliance. There are no messages that can’t be clustered. Therefore the predefined set o features are calculated for each email.

The features based on the email’s template alone are not enough to mark an email as spam, as more and more of these messages copy the template used by regular/legit emails.Hence we are working on new features that will allow the cluster based rules to tag emails as spam without the intervention of an analyst.

Page 18: Do Humans Beat Computers At Pattern Recognition

AUTOMATED SIGNATURES CREATION

-Short description:

Until a few month ago we were considering that an automated pattern extraction mechanism wouldn’t be very efficient taking into account the current variety found in spam belonging to the same wave.

By simplifying the process we get 4 steps:1.Extracts patterns from a pool of spam;2.Sorts them by the number of apparitions;3.Creates automated signatures;4.Tests the newly created signs;5.Sends them for a FP test;

Page 19: Do Humans Beat Computers At Pattern Recognition

AUTOMATED SIGNATURES CREATION

- Pattern extraction mechanism If the line extraction mechanism was based on a set of keywords to define the relevant phrases, this system extracts almost all the lines from a spam message (body and headers). Afterwards it eliminates the patterns which contain only html tags or lines shorter than a predefined threshold.

-Pro & Cons+Helps decrease the reaction time;+Doesn’t create FPs;-It still needs an analyst to validate the resulting signatures;

Page 20: Do Humans Beat Computers At Pattern Recognition

Overview

All these systems are a step closer toward a fully automated mechanism of creating signatures.

The most important advantage brought is that of better reaction time and an increase of the detection rate by 5%-10%.

There are no FPs, as all the systems in use are overlooked by analysts and they make the final decision of whether a signature is good or not.

Page 21: Do Humans Beat Computers At Pattern Recognition

NO!

What methods of automated

pattern recognition have you developed?

Page 22: Do Humans Beat Computers At Pattern Recognition

Do humans beat computers at pattern recognition?

NO

YES

What do you think?

Page 23: Do Humans Beat Computers At Pattern Recognition

If (YES) { ANALYSTS RULE }

Page 24: Do Humans Beat Computers At Pattern Recognition

Short description:

We are a team of 10 people, full of enthusiasm and desire of putting an end to spam.

What makes us great?

Our enhanced senses of recognizing patterns.

ANALYSTS TEAM

Page 25: Do Humans Beat Computers At Pattern Recognition

- Pros & Cons

+ We can find a pattern in any given spam;+ We know when is safe to say “This is spam”;+ We adapt to any situation;+ We can predict certain evolution of spam waves and be proactive about it;+ We can maintain a detection rate of over 97%;

- We are expensive;- We have a longer reaction time ;- We sometimes make mistakes… we’re just humans after all;

ANALYSTS TEAM

Page 26: Do Humans Beat Computers At Pattern Recognition

Automated pattern extraction mechanisms

- Shorter reaction time;-Work only for some spam waves;- Are less expensive;

Analysts team

-Longer reaction time;-Can extract a pattern for any spam wave;-Cost a lot;

A few ..conclusions

Page 27: Do Humans Beat Computers At Pattern Recognition

Q&A

Andra [email protected]