polygraph: automatically generating signatures for polymorphic worms

28
Polygraph: Automatically Generating Signatures for Polymorphic Worms Authors: James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU) Presenter: Abhishek Karnik

Upload: glynn

Post on 06-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Polygraph: Automatically Generating Signatures for Polymorphic Worms. Authors: James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU) Presenter: Abhishek Karnik. Background. IDSes block Internet Worm flows based on signatures based on a worms payload using strings matched on: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Polygraph:Automatically Generating Signatures

for Polymorphic Worms

Authors:James Newsome (CMU), Brad Karp (Intel Research), Dawn Song (CMU)

Presenter: Abhishek Karnik

Page 2: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Background IDSes block Internet Worm flows based on signatures

based on a worms payload using strings matched on: Fixed payload offsets Arbitrary payload offsets Regular expressions

Signatures generated manually by experts based on hours or days of observation

Recently researchers are giving attention to automating this slow process. [Honeycomb, Autograph, EarlyBird]

Automated signatures produced by extracting common byte patters across different suspicious flows

Page 3: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Previous Automated Methods Signatures based on a single

contiguous substring of sufficient length from a worms payload

Assumptions: There exists a single payload substring

that will remain invariant across worm connections specific to the worm

Invariant string is sufficiently long to be specific and does not occur in any non-worm payloads

Page 4: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Motivation

Future worms may by polymorphic and thus may evade such signatures based on single substrings.

Polymorphic obfuscator available which are capable of leaving nearly no multi-byte regions in common across its outputs.

Page 5: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Goal of Polygraph

Present algorithms and identify methods to generate automatic signatures suited for matching polymorphic worm payloads

Evaluate such algorithms to demonstrate that Polygraph produces signatures that exhibit low false negatives and false positives

Page 6: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Assumptions A worm must exploit one or more

specific server software vulnerabilities A real-world exploit contain multiple

disjoint invariant substrings in all variant payloads Invariant bytes include protocol framing

bytes, which allows the server to branch down the code path where a vulnerability exists and possibly overwrite a jump address

Page 7: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Approach – Exploits Within a worm there are three classes of bytes:

Invariant bytes Wildcard bytes Code bytes

Over 15 software vulnerabilities spanning various OS’s and applications surveyed. Nearly All require invariant content in any exploit.

Two sources of Invariant content Invariant Exploiting Frame Invariant Overwrite Values

Page 8: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Approach - Examples Apache-Knacher exploit

Page 9: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Approach - Examples Lion Exploit

Page 10: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Architecture

Flow classifier reassembles flows and classifies them based on same IP and port number into innocuous and suspicious flows

Page 11: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Architecture Identifying anomalous or suspicious

traffic classified by use of honeypots or port scan activity.

Assumptions for Flow Classifier: There maybe noise introduced during

classification Flow classifier does not distinguish

between different worms this suspicious pool may contain a mixture of worms which may or may not be polymorphic

Page 12: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Signature Generator Goals Signature quality – low false +ve’s for

innocuous traffic and low –ve’s for wrm instances

Efficient signature generation Efficient signature matching Generation of small signature sets –

small number of signatures Robust against noise and multiple

worms Robust against evasion and subversion

Page 13: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Signature Algorithms All signatures are built from substrings called

tokens Each signature is made of one or more tokens Following algorithms extract and analyze tokens

which are then used to create signatures Token extraction eliminates irrelevant parts of

suspicious flows Preprocessing

Extract distinct substrings of minimum length ‘α’ that occur in at least K out of n samples in the suspicious pool – longest substring algorithms

Represent each suspicious flow as a sequence of tokens, and remove the rest of the payload.

Page 14: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Signature Algorithms Conjunction Signatures

A signature that consists of all tokens in the set found in any order.

Matches multiple invariant tokens and is more specific than matching only one token alone.

The signature is the set of tokens. Token-subsequence signatures

A signature that consists of an ordered set of tokens

Can be expressed using regular expressions A signature is generated if the ordered

subsequence of tokens is present in every sample in the suspicious pool.

Page 15: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Signature Algorithms xxonexxxtwox – string 1 oneyyyyyyytwoyyy – string2 Longest subsequence is onetwo String alignment used

x x o n e x x x - - t w o x – -- - o n e y y y y y t w o y y y

Regular expression “.*one.*two.*” An alignment is assigned a score by adding 1

and subtracting a gap penalty of Wg “.*o.*n.*e.*z.*” has a value 4 – 3*.8 = 1.6 “.*two.* has a value 3 – 0*.8 = 3

Page 16: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Signature Algorithms Bayes Signatures

A probabilistic matching method A signature consisting of a set of tokens each

associated with a score and an overall threshold

Matching and construction is less rigid compared to conjunction and token based methods

Allows signatures to be learned from suspicious pools that contain samples of unrelated and innocuous worms

Classify a flow by the distribution from which its token set is more likely to be generated

Page 17: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Signature Algorithms Pr[worm|x] / Pr[~worm|x] Set a threshold so that the classifier reports

+ve only if its surface is sufficiently far away from the decision boundary- Helps handling noise

Each item is assigned a score based on its probability or being from a certain pool.

Scores are added together and if the total is greater than the threshold the sample is classified as a worm.

Page 18: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Generating Multiple Signatures

Suspicious flows could contain more than one type of worm

Suspicious pool is divided into clusters each containing similar flows.

Signatures outputted per cluster Quality of clusters

Clusters should not be too general Clusters should not be too specific

Page 19: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Hierarchical Clustering Used for token subsequence and conjunction

algos. Given s clusters initially, s signatures generated Iteratively merge clusters producing a more

sensitive signature Determine what the merged signature might be

and use innocuous flows to estimate false positives

Lower false +ve rate more specific the signature, more similar the two clusters

Stop clustering when any two clusters give a high false +ve rate of there is only one cluster

Page 20: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Experiments K = 3 α = 2 Minimum cluster size = 3 Network traces: Intel Research

Pittsburg in October 2004 DNS traces from a major academic

institution Intel Pentium III running on Linux

2.4.20

Page 21: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Results – Apache-Knacker

Page 22: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Results – BIND Lion Exploit

Page 23: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Polymorphic with Noise

Page 24: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Polymorphic with Noise

Page 25: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Conclusions Polygraph works for polymorphic worms Content variability is limited by nature of

the software vulnerability Use multiple, disjoint strings that are

invariant across copies of a worm Accurate signatures can be automatically

generated for polymorphic worms Demonstrated low false positives with real

exploits, on real traffic traces

Page 26: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Strengths

A new concept in the area of Intrusion Detection which must be explored further

Well written paper covering almost all possible aspects and providing 3 algorithms

Page 27: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Weakness

Vulnerable to Overtraining Attacks Long-Tail Attacks

Page 28: Polygraph: Automatically Generating Signatures for Polymorphic Worms

Potential Extensions

Applying Polygraph to a distributed IDS

Adapting to IPv6