over the last years, the amount of malicious code (viruses, worms, trojans, etc.) sent through the...

1
Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing. Due to this significant growth, viruses' renewal and improvement is done much faster than the update time of the anti-virus software selling today. Our solution focuses on the signature generation process. We have developed an automatic system, which its goal is to extract simple, unique and optimal signatures for malicious files. This way any IDS/IPS will be able to neutralize a hostile code in real-time. In addition we have developed an evaluation environment - its objective is to determine the best configuration for generating an optimal signature for malicious files. Language: IDE: Operation System: Ido Levin Ofir Nissel Yotam Katzman Academic Advisor: Dr. Yuval Elovici Professional Advisor: Mr. Asaf Shabtai Extraction method : The Interactive Disassembler (IDA) : IDA is a commercial Disassembler widely used for reverse engineering meaning, it is able to receive a binary file and reverse it back to the assembler code. Using a dedicated plug-in, IDA can identify, extract and normalize all the functions in the file. Data mining : Using classifier which takes a training set of bytes' segments and classify if it an end, start or neither, then classify segments of bytes from a suspicious file, and determine if these segments are start, end or neither. That way we are able to extract functions from a given file. Selection methods : Random Selector : Choose a signature randomly from the candidates. Minimum Entropy Selector : The selector calculates the entropy of the candidates and selects the one with the minimum entropy. Cluster Selector : This Selector creates groups of candidates by their distance from each other, and will score each cluster by the chance it will contain the best signature. Each cluster will get score that will reflect this chance with the following formula: Probability Selector : Key idea: estimate the probability that each of the candidate signatures will match a randomly chosen block of bytes in the code of a randomly chosen program Select one or more signatures with the lowest estimated False Positive probabilities of all the candidates which is less than pre-defined threshold. Cs denotes Cluster size in bytes Fs denotes File’s Size Fc denotes number of functions in cluster T denotes total number of function in file Fl denotes the sum of function’s length in cluster Cs Fl T Fc Fs Cs re ClusterSco Let S be a string/signature. Sc character in S |Sc| the number of times Sc appears at S. The Entropy of S will be as follows: | | | | log | | | | ) ( 2 S S S S S E c C c c For a given sequence of S bytes B=B1B2…BS estimate the probability p(B) for B to occur in a large body of normal uninfected code: TS - number of S-byte sequences in a large corpus of uninfected programs f(B) - number of occurrences of B in Ts 3 1 2 4 3 3 2 1 2 4 3 2 3 2 1 2 1 ) ( )... ( ) ( ) ( )... ( ) ( ) ... ( T B B f B B f B B f B B B f B B B f B B B f B B B P s s s s s s Generally, the Signature Builder system operation is: Building a common functions library (CFL), Given a malicious file, extract its functions and filter the common ones using the CFL, generate signature and at last Choosing from the remaining functions (candidates), the best one to act as the malicious file’s signature. The system extracts functions from the malwares by several algorithms, and provide a signature for each malware. Initialize Configuration CFL Handling Receive File from Client Initialize the system Extracting Functions Filter Common Functions Generate Candidates Select Best Candidate Return Signature Evaluation Environment - evaluates the different configurations of the signature builder, in order to decide about the quality of the signature. The main idea is checking if a signature of a malicious file appears in control group- benign files. Of course, a good signature which belongs to a malicious file – should not appear in benign files. The output consists the following: Processed - The number of malware files that the system managed to generate a signature for them. Processed (%) - Processed / Total Malware Files. Signature Hits - The number of malware files that gives at least one False Alarm, which means the number of unique malware files that produced False Alarm. Signature Hits (%) - Signature Hits / Processed. Unique Signature - The number of unique signatures that didn’t produced FA. Different Files - The number of distinct files in the Control Group that has at least one hit. Different Files (%) – Different Files / Total Control Group Files. Each configuration consist the following input: CFL size in MB maximum signature length in byte Function similarity threshold Offset size in byte Function Extractor Function selection.

Upload: griffin-blankenship

Post on 11-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing. Due to this significant

Over the last years, the amount of malicious code (Viruses, worms, Trojans, etc.) sent through the internet is highly increasing.

Due to this significant growth, viruses' renewal and improvement is done much faster than the update time of the anti-virus software selling today. Our solution focuses on the signature generation process. We have developed an automatic system, which its goal is to extract simple, unique and optimal signatures for malicious files.This way any IDS/IPS will be able to neutralize a hostile code in real-time. In addition we have developed an evaluation environment - its objective is to determine the best configuration for generating an optimal signature for malicious files.

Language: IDE: Operation System:Ido Levin Ofir Nissel Yotam Katzman

Academic Advisor: Dr. Yuval EloviciProfessional Advisor: Mr. Asaf Shabtai

Extraction method:

The Interactive Disassembler (IDA):IDA is a commercial Disassembler widely used for reverse engineering meaning, it is able to receive a binary file and reverse it back to the assembler code. Using a dedicated plug-in, IDA can identify, extract and normalize all the functions in the file.

Data mining :Using classifier which takes a training set of bytes' segments and classify if it an end, start or neither, then classify segments of bytes from a suspicious file, and determine if these segments are start, end or neither. That way we are able to extract functions from a given file.

Selection methods:

Random Selector:Choose a signature randomly from the candidates.

Minimum Entropy Selector:The selector calculates the entropy of the candidates and selects the one with the minimum entropy.

Cluster Selector:This Selector creates groups of candidates by their distance from each other, and will score each cluster by the chance it will contain the best signature. Each cluster will get score that will reflect this chance with the following formula:

Probability Selector:Key idea: estimate the probability that each of the candidate signatures will match a randomly chosen block of bytes in the code of a randomly chosen program

Select one or more signatures with the lowest estimated False Positive probabilities of all the candidates which is less than pre-defined threshold.

• Cs denotes Cluster size in bytes

• Fs denotes File’s Size

• Fc denotes number of functions in cluster

• T denotes total number of function in file

• Fl denotes the sum of function’s length in cluster

Cs

Fl

T

Fc

Fs

CsreClusterSco

• Let S be a string/signature.

• Sc character in S

• |Sc| the number of times Sc appears at S.

• The Entropy of S will be as follows:

||

||log

||

||)( 2 S

S

S

SSE c

Cc

c

• For a given sequence of S bytes B=B1B2…BS estimate the probability p(B) for B to occur in a large body of normal uninfected code:

• TS - number of S-byte sequences in a large corpus of uninfected programs

• f(B) - number of occurrences of B in Ts

3124332

1243232121 )()...()(

)()...()()...(

TBBfBBfBBf

BBBfBBBfBBBfBBBP

ss

ssss

Generally, the Signature Builder system operation is:Building a common functions library (CFL), Given a malicious file, extract its functions and filter the common ones using the CFL, generate signature and at last Choosing from the remaining functions (candidates), the best one to act as the malicious file’s signature. The system extracts functions from the malwares by several algorithms, and provide a signature for each malware.

Initialize Configuration

CFL Handling

Receive File from Client

Initialize the system

Extracting Functions

Filter Common Functions

Generate Candidates

Select Best Candidate

Return Signature

Evaluation Environment - evaluates the different configurations of the signature builder, in order to decide about the quality of the signature. The main idea is checking if a signature of a malicious file appears in control group- benign files. Of course, a good signature which belongs to a malicious file – should not appear in benign files.

The output consists the following:• Processed - The number of malware files that the system

managed to generate a signature for them.

• Processed (%) - Processed / Total Malware Files.

• Signature Hits - The number of malware files

that gives at least one False Alarm, which means the number

of unique malware files that produced False Alarm.

• Signature Hits (%) - Signature Hits / Processed.

• Unique Signature - The number of unique signatures

that didn’t produced FA.

• Different Files - The number of distinct files in the Control Group

that has at least one hit.

• Different Files (%) – Different Files / Total Control Group Files.

Each configuration consist the following input:

• CFL size in MB

• maximum signature length in byte

• Function similarity threshold

• Offset size in byte

• Function Extractor

• Function selection.