methods of impvin string effciency

8/8/2019 Methods of Impvin String Effciency

1/40

Improving the Efficiency of Network IntrusionDetection Systems

B. Tech Project Report

Submitted in partial fulfillment of the requirementsfor the degree of

Bachelor of Technology

Nakul Aggarwal

Roll No: 02005022

under the guidance ofProf. Om Damani

&Prof. Kriti Ramamritham

a

Department of Computer Science and EngineeringIndian Institute of Technology

BombayMay 3, 2006


2/40

BTech Project Approval Sheet

I hereby state that contents of this work are mine. Any substantially borrowed material

(cut-pasted or otherwise) including figures, tables and sketches have been duly acknowl-edged.

Nakul Aggarwal(Roll no: 02005022)

Date :

I hereby give my approval for the B.Tech Project Report titled Improving The Efficiencyof Intrusion Detection Systems by Nakul Aggarwal (02005022) to be submitted.

Prof. Om Damani

Prof. Krithi Ramamritham

Date :

2


3/40

Acknowledgments

I would like to express my sincere gratitude towards my guides Prof. Om Damani and Prof.Krithi Ramamritham for their invaluable consistent support and guidance. They has beengenerous enough to let me pursue the work of my interest.

Nakul Aggarwal,May, 2006IIT Bombay.

3


4/40

Abstract

Network intrusion detection systems have become standard components in security infras-tructures. The elements central to intrusion detection are the resources to be protected in atarget network, i.e., computer systems, file systems, network information, etc; models thatcharacterize the normal or legitimate behavior of network; techniques that compare the ac-tual network activities with the established models, and identify those that are abnormal orintrusive.There are two approaches to combat issue of intrusion depending upon whether we havesome previous info of the attacks or not. One is, when from earlier intrusions we want toknow whether new flows are intrusive in nature. Other is after learning the normal behav-

ior of a network we want to classify new flows are normal or intrusive. Here we will lookat some of the approaches, algorithms, issues still unsolved. Then we had looked at theissue of evading IDSs by overflowing their network buffers with out of order packets andhas proposed a solution. Also, implementing inline and adaptive clustering mechanisms foranomaly detection techniques at high traffic rate has been an limitation in anomaly detec-tion approaches. ADWICE has been first effort in this field but since it uses distance basedclustering mechanism it suffers from inefficient clustering. We have proposed additionaldensity based statistical variables with each cluster so as to improve the efficiency.

i


5/40

Contents

1 Introduction 1

2 Misuse Detection 3

2.1 Approaches to Misuse Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Algorithms in Misuse Detection 6

3.1 Boyer Moore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Knuth-Morris-Pratt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 Aho Corasick . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.4 Bloom Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.5 NFA/DFA at hardware level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4 Snort 10

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.2 Snort Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.3 Architecture of String Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.4 Working Model of the code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.5 Some More about Snort Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.5.1 Preprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.5.2 Inline Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.6 Multi-Pattern String matching Algorithms in Snort . . . . . . . . . . . . . . . . 134.6.1 Boyer-Moore Multi-pattern String Matching . . . . . . . . . . . . . . . 134.6.2 Wu-Manber Multi-pattern String Matching . . . . . . . . . . . . . . . . 134.6.3 Aho-Corasick Multi-pattern Matching . . . . . . . . . . . . . . . . . . . 134.6.4 Aho/Corasick with Sparse Matrix Implementation . . . . . . . . . . . 144.6.5 SFKSearch using Tries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 Bro 15

6 Issues with Pattern Matching 17

ii


6/40

7 Issue of Out of Order packets 207.1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

8 Anomaly Detection 238.1 Approaches to Anomaly Detection . . . . . . . . . . . . . . . . . . . . . . . . . 24

9 Clustering Algorithms for Anomaly Detection 279.1 BIRCH - Balanced Iterative Reducing and Clustering . . . . . . . . . . . . . . 279.2 DBSCAN - Density-Based Algorithm for Discovering Clusters in Large Spa-

tial Databases with Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

10 ADWICE-TRAD 30

11 Conclusion and Future Work 3211.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

iii


7/40

C h a p t e r 1

Introduction

There has been significant rise in the number of network attacks, hacking into the systemsusing as simple as buffer overflows, new worms making whole networks down, attackingof web servers via exploitation of software bugs or DOS attacks. Because of the increas-ing personal information at stakes in the networks and ever expanding internet/intranetthreats, theres much work going on in combating these attacks. Intrusion Detection is pri-mary concerned with the detection of illegal activities and acquisitions of privileges thatcannot be detected with information flow and access control modules. Intrusion detectioncan be of two types either Pattern Matching orAnomaly Detection. Pattern matching is just oneof the methods where system inspects network traffic for matches against exact, precisely-described patterns, while, Anomaly Detection learns the normal network traffic and then

detects network intrusions by classifying the real traffic as being normal or anomalous.The increasing network utilization and weekly increase in the number of critical appli-

cation layer exploits implies IDS designers must find ways to speed up their attack analysistechniques when monitoring a fully-saturated network with less number of false positives.Even the studies on empirical data indicate that number of signatures (which represent theone or the other unique malicious activity) has grown around 2.5 times in last 3 yrs [20]. Thentens of vulnerabilities of various softwares are exposed each day at various security relatedlists and newsgroups or buqtraq mailing lists.

Signature Matching is the core of malicious traffic/event detection engines, independentof implementation level in network i.e. whether its deployed at network Perimeter (typi-

cally known as Demilitarized Zone (DMZs)), at network level(NIDS) or at host level(HIDS).And implementations exist at both level as in softwares products, hardware chips or patternmaching engines. Some of the most popular software NIDS includes Snort, Bro, Dragon IDSetc. Signature matching engines at hardware level implement the signatures with the helpof LookUp Tables (LUTs), TCAMs, NFA/DFA and pattern matching is done in router itselffor each packet maintaining the session flow information per IP.

Snort is the one of the most widely deployed IDS tools. Statistics say that signaturematching is the most computational intensive part of an IDS system. In Snort, upto 70% ofthe total execution time goes in this process which clearly reflects the vast amount of workthat still needs to be done. Also, other than pattern matching when it comes to statesful

pattern matching we have the issues of out-of-memory and excessive CPU usages, thereforemuch work still needs to be done in this field. Pattern matching for network security and

1


8/40


9/40

C h a p t e r 2

Misuse Detection

Misuse detection aims to detect well-known attacks as well as slight variations of them,by characterizing the rules that govern these attacks. Systems based on this approach usedifferent models like state transition analysis, or a more formal pattern classification. By itsnature, misuse detection tends to have low number of false positives but is unable to detectattacks that lie beyond its knowledge. Some examples being:

1. IP packets that exceed the maximum legal length (65535 octets)

2. /User-Agent \:[\n]+ PassSickle/iThis is a example signature for capturing the packets containing trojan horse PassSickle.

2.1 Approaches to Misuse Detection

Misuse detection approaches can be classified into the following categories:

Signature Analysis

Association Rules

State Transition Analysis

Data Mining Approaches

Misuse Detection systems has knowledge of both the normal and the anomalous data andnew flows are classified into the one of the two categories depending upon one of the abovementioned approaches used. Where the anomalous data is represented by the signatures aswe have seen in above example, all the data with no such signatures are considered to benormal.

Signature analysis or Pattern Matching is the technique of matching the data with a setof predefined ruleset or signatures with any of the pattern matching algorithms which willbe discussed in the chapter 3.

Association Rules or Expert systems defines the intrusions as a set of rules and correspond-ing actions, which are fired whenever a matching with some rule takes place.

3


10/40

State Transition Analysis Here the known intrusions are defined as definite finite state ma-chine with some end nodes, every event either takes you to next state depending upon the

transitional input. Bro (refer Chapter 5) is an example of this kind of approach, where eachmatching of some signature, flags, etc. triggers the correlation engine which makes an tran-sition on the state machine.

Data Mining Approaches use statistical classification techniques like Naive Bayes, Deci-sion Trees, Neural Networks, genetic algorithms etc. to classify the new events/flows beingnormal or anomalous. Being statistical this requires some data to build up the models tomatch new data against. Hence, here some learning data where flows are pre-labelled asnormal and/or anomalous is feed into the machine initially to build up the models and thenuse these learned models for further classification.

While Misuse detection is the most widely deployed mechanism for NIDS, it suffersfrom following flaws which has lead to the search of more efficient techniques for intrusiondetection. Some of the limitations being :

1. Since, it uses pre-defined set of signatures, it is not able to detect new threats/intrusions.

2. Over last few years, networks has seen large variety of intrusions, providing a largesignature set leading to large number of false positives and requires large human effortoptimizing the signature set as per ones network needs and requirements.

3. Updating of Signatures. These systems needs to be regularly updated to the newestrule-set from the respective sites for combating everydays new attacks.

4. Signature obfuscation. Here a attack eludes the NIDS by exploiting the fact that signa-ture doesnt covers all the attack instances. i.e. Given a signature blaster, the NIDScan be easily evaded by the malicious packet[s] if it contains the string mlaster etc.

5. Other IDS evasion and invasion techniques. There contains a large set which has beenthoroughly discussed in [12] ( For eg. evading the signature matching rule set byadditional packet with arbitrary string and low TTL in between packets which containsthe main string, this substring will prevent matching engine from matching but the end

host gets affected since it doesnt gets the additional packet with low TTL value.6. Latency in development. This type of systems involve high manual involvement life-

long. From the starting of installing, optimizing the rule set, regular updating of sig-natures, checking the alerts and hence the intrusions.

7. Association rules do suffer from all above with additional overhead of the clumsinesswhich comes through as the number of attributes to match keep on increasing.

But, rather looking for the superset of misuse detection to be able to detect every intru-sion, people rather looked for removing the limitations which gave rise to anomaly detection

techniques, which are able to detect new intrusions and donot suffer from large signatureset issues (since it doesnot uses any signature set).

4


11/40

Misuse Detection via Signature Matching is the most widely accepted approach becauseof the large research base which provides the constant and updated flow of signature set,typically within hours when a new vulnerability or exploit or worm is detected. One of

the most widely deployed tool for NIDS is Snort, which also uses this approach. A de-tailed study of the snort architecture, techniques, algorithms and code has been discussedin Chapter 4.

Pattern Matching matches the input flow with given a set of signatures. Signatures canbe both flags matching, IP Addresses, or content in the payload (which is there in most ofthe signatures). Hence, most commonly string matching algorithms like Boyer More etc.are deployed as part of these NIDS. Lets look at some of the common pattern matchingalgorithms.

5


12/40

C h a p t e r 3

Algorithms in Misuse Detection

Here, we will be discussing some of the basic must-know algorithms of string or patternmatching. These include

1. Simple string matching

Boyer-Moore

Knuth-Morris-Pratt (KMP)

2. State Machine Matching

Aho/Corasick

3. Hardware Solutions

Bloom Filters and Extended Bloom Filters NFA/DFA implementation at hardware level

3.1 Boyer Moore

Main features

performs the comparisons from right to left; preprocessing phase in O(m + ) time and space complexity where is character set

size of pattern;

searching phase in O(mn) time complexity example am in an; 3n text character comparisons in the worst case when searching for a non periodic

pattern and n in average case. O(n/m) best performance example when amb in bn.

The Boyer-Moore algorithm[5] uses two different heuristics for determining the max-imum possible shift distance in case of a mismatch: the bad character and the goodsuffix heuristics. The first heuristic, referred to as the bad character heuristic, works asfollows: if the search pattern contains a mismatching character (that is different from corre-sponding character in the given text), the pattern is shifted so that the mismatching charac-

ter is aligned with the rightmost position at which it appears inside the pattern. The secondheuristic, works as follows: if a mismatch is found in the middle of the pattern, the search

6


13/40

pattern is shifted to the next occurrence of the matched suffix in the pattern. Both heuris-tics can lead to a shift distance ofm. For the bad character heuristics this is the case, if thefirst comparison causes a mismatch and the corresponding text symbol does not occur in

the pattern at all. For the good suffix heuristics this is the case, if only the first comparisonwas a match, but that symbol does not occur elsewhere in the pattern. And with the helpof preprocessed bad character and good suffix values, one can finds the value of shiftneeded as the max of these two.

The preprocessing for the good suffix heuristics is rather difficult to understand and toimplement. Therefore, some versions of the Boyer-Moore algorithm are found in which thegood suffix heuristics is left away. The argument is that the bad character heuristics wouldbe sufficient and the good suffix heuristics would not save many comparisons. However,this is not true for small alphabet sets.

3.2 Knuth-Morris-Pratt

Main features

Performs the comparisons from left to right; Preprocessing phase in O(m) space and time complexity; Searching phase in O(n + m) time complexity (independent from the alphabet size);

This was significant improvement in memory requirements over finite state machinebased string matching. It pre-calculates a auxiliary function (m-dimension) which contains

the information about the jumping from current state to next state.While during string matching process, contains the information about the optimum

shifts needed in the case of a mis-match. The optimum shift depends on the prefix in patternwhich is also the suffix in the matched pattern part.

3.3 Aho Corasick

Main Features

Performs the comparison from left to right;

Searching phase in O(n) time complexity; Preprocessing phase has O(m

) space requirements, where

is alphabet set size.

Aho/Corasick String Matching Automaton for a given finite set P of patterns is a (deter-ministic) finite automaton G accepting the set of all words containing a word ofP.G consists of the following components:

1. finite set Q of states2. finite alphabet

3. transition function : Q Q + fail

4. initial state q0

in Q5. a set F of final states

7


14/40

Transition table is built during the preprocessing part. Where at each state, there is in-formation about where to jump to for each character

. It just traverses the string to be

matched making transitions via the , the transition function which tells which state to jump

for each character

. Whenever we reach a state F, a match is reported by the engine.For simple string matching cases, it doesnot performs very well but when there are multiplepatterns or pattern matching is done at regular expression level, it is one of the best optionsfor pattern matching.

3.4 Bloom Filters

A Bloom filter is a space efficient randomized data-structure used for concisely representinga set in order to support approximate membership queries. The space efficiency is achieved

at the cost of a small probability of false positives. This means that a Bloom filter couldwrongly accept some entry even if it does not belong to the set under consideration. How-ever, wise selection of the filters parameters can guarantee a small false positives probabil-ity.

Given a string X, the Bloom filter computes k hash functions on it producing k hashvalues ranging from 1 to m. It then sets k bits in a m- bit long vector at the addressescorresponding to the k hash values. The same procedure is repeated for all the membersof the set. This process is called programming of the filter. The query process is similar toprogramming, where a string whose membership is to be verified is input to the filter. TheBloom filter generates k hash values using the same hash functions it used to program the

filter. The bits in the m-bit long vector at the locations corresponding to the k hash valuesare looked up. If at least one of these k bits is found not set then the string is declared tobe a nonmember of the set. If all the bits are found to be set then the string is said to belongto the set with a certain probability. Therefore, a Bloom filter could result in false positives;where an item is accepted while it does not actually belong to the set.

Lately, there has been much improvements in this technology also with the modificationsleading to counting bloom filters, Compressed Bloom Filters, Bloomier filters etc.

3.5 NFA/DFA at hardware level

Sidhu and Prasanna in [18],first time implemented NFA matching onto programmable logicin O(n2) logic and still providing O(1) access time. They implemented One-Hot Encoding(OHE) scheme, where one flip-flop is associated with each state and at anytime only oneis active. Then combinational logic associated with each flip flop ensures that this 1-bit istransferred to flip-flop corresponding to next state in the DFA. For fitting in logic of theexisting patterns, first DFA is formed and then a NFA. Now each transition is mapped tothese flip-flop structure. Taking care of the transitions in the NFAs by providing the sameinput to next state also, and usage of LUTs for comparing the input character, they are ableto map the patterns to the FPGAs.

The reported times are amazing, the string matching time for 11MB file, reported CPUtime and maximum memory usage of0.34sec and 580KB respectively, while the same when

8


15/40

matching when done by DFA matching engine as software reported above mentioned statsto be 87309.38sec and 229MB respectively.

While, there had been a lot of modifications and advancements in this approach also

after this initial effort.

9


16/40

C h a p t e r 4

Snort

4.1 IntroductionSnort can perform real-time packet logging, protocol analysis, and content searching/matching.It can be used to detect a variety of attacks and probes such as stealth port scans, CGI-basedattacks, Address Resolution Protocol (ARP) spoofing, and attacks on daemons with knownweaknesses. Snort utilizes descriptive rules to determine what traffic it should monitor anda modularly designed detection engine to pinpoint attacks in real time. When an attackis identified, Snort can take a variety of actions to alert the systems administrator to thethreat. Snort into its first releases used to have brute force matching which was very slow.First boost to signature matching came in with the implementation of Boyer-Moore pattern

matching Algorithm. They have come long way after these initial implementations, we willsee some of those soon.

4.2 Snort Rules

A sample snort rule can be written as..alert udp $EXTERNAL NET any -> $HOME NET 177 (msg:"MISC xdmcp

query"; content: "|00 01 00 03 00 01 00|";reference:arachnids,476;

classtype:attempted-recon; sid:517; rev:1;)

This rule has been broken down into 2 parts: Rule header (everything upto first paren-

thesis) and Rule options (everything in the hypothesis). Rule headers forms the RTN (RuleTree Node) in the snort matching architecture while Rule Options forms the OTNs (OptionalTree Node). How, this helps in matching, we will see in next section.

4.3 Architecture of String Matching

Snort contains RuleList global variable which has four RTN head nodes. Four heads cor-responds to each of the four protocols TCP, ICMP, IP and UDP. These head nodes are headnodes of the RTN linked lists. Each rule in the rules file is added to the respective list. Since,

many of the rules contains the same Rule Headers, therefore each of the RTN node containsa pointer to the head node of the OTN linked list which contains all the rules with the same

10


17/40

RTN header. Each OTN node further contains some other flags that needs to matched (likeAck flag should be set etc.) and these checks are performed before the string matching toavoid unnecessary pattern matching in case even flag doesnot matches. And when flag also

matches, engine calls the function pointer stored to do other(if any) necessary checks andstring matching using any of the string matching algorithm.

By default, Wu-Manber string matching algorithm is used. But snort contains imple-mentations of large number of other pattern matching engines also including ModifiedWu-Manber Style Multi-Pattern Matcher, SFK matching engine, Aho/Corasick, Optimiza-tions on Aho/Corasick, Sparse Matrix implementation of Aho/Corasick etc. We will bediscussing some of these algorithms in next section.

This Rulelist is build up during the initialization of the engine. But lately the numberof rules in snort rule DB has exceeded even 3000 mark such that the above mentioned 3 -dimension linked lists [RTN, OTN, function pointers] are not able to work at high speed in

the network. Therefore they have done one more optimization i.e. they have implementeda fast packet classification engine adding a 4rth dimension to the above structure.

This fourth dimension is port based classification of rules and this is done before theRTN lists are created. That is we have port based classification for ruleset after the fourprotocols mentioned above. The authors has assumed that given the port values (both ofsource or destination) we can drop the rule in one of the following class.

1. Unique Source Port

2. Unique Destination Port

3. Unique Source and Destination Port

4. Generic (source and Destination port can take any value)

Now each structure has linked list array of MAX PORT size (64*1024). This allows O(1)mapping of rule on the basis of port value to its appropriate list. This additional dimensionspeeds up the process of string matching since now the number of rules to be matchedagainst the incoming traffic are reduced by high number.

4.4 Working Model of the code

Snorts architecture is focused on performance, simplicity, and flexibility. There are threeprimary subsystems that make up Snort: the packet decoder, the detection engine, and thelogging and alerting subsystem.

These subsystems ride on top of the libpcap promiscuous packet sniffing library, whichprovides a portable packet sniffing and filtering capability. Program configuration, rulesparsing, and data structure generation takes place before the sniffer section is initialized,

keeping the amount of per packet processing to the minimum required to achieve the baseprogram functionality.

11


18/40

4.5 Some More about Snort Powers

4.5.1 Preprocessors

With the arrival of term Anomaly Detection, their was high demand of this in snortalso. Since, rule based matching was done in Detection engine, the protocol anomaly de-tection and many other functionalities which are independent of rules comes under thiscategory. Also, each added preprocessor, will demand more processing time affecting themain strength of snort i.e. fast rule-based matching. Hence, authors thought of adding thisfunctionality as modular plug-ins something similar to modules in linux kernel which canbe deactivated whenever not needed or when they are effecting the snort performance.

Preprocessors are plugable components of Snort, introduced since version 1.5. Theyrelocated just after the module of protocol analysis and before detection engine and do notdepend of rules. They are called whenever a packet arrives, but just once, the detection

plugins, in the other hand, do depend of rules and may be applied many times for a sin-gle packet. SPPs (Snort Preprocessors) can be used in different ways: They can look foran specific behavior(portscan, flowportscan), to be support for further analysis(is this theexpression? help us) like flow, or just collect certain information, like perfmonitor.

Hola Anonimo has given a very basic level tutorial [1] on how to write a preprocessorplugin for Snort. Some of the major achievements or goals of Preprocessors were

To decrease the number of false positives, Adding the anomaly detection techniques to snort and last but not the least Improving the pattern matching when pattern is extended over multiple segments or

fragments.

Now we will look at last 2 of the above mentioned achievements.

Anomaly Detection Anomaly Detection preprocessors include both type of protocol anomalydetection (via protocol specific PP like arpspoof, telnet decode etc) and even the ad-vanced techniques of statistical approaches to anomaly detection via the Spade plugin(A brief description has been given in Appendix A).

Pattern Matching over Multiple Packets This is achieved through the Stream4 and Frag2preprocessors, where former adds the TCP statefulness and session reassembly so thatconnection status and information can be stored providing more information on alerts

and also removing the unnecessary checks and also check the patterns which are ex-tending over multiple packets. While the latter preprocessor prevents the IDS evasionand invasions via fragmented packets [15].

4.5.2 Inline Mode

Inline Mode is optional argument in Snort which actually increases the processing speedof snort. Since in this level, it works at the same level as Iptables, where each packet isprocessed first accessed by snort and then passed to the linux kernel, hence preventing sig-nificant overheads involved in kernel processing in cases when packet needs to be dropped.

Snort inline obtains packets from iptables instead of libpcap and then uses new rule typesto help iptables pass or drop packets based on Snort rules.

12


19/40

4.6 Multi-Pattern String matching Algorithms in Snort

1. Boyer More

2. Wu-Manber3. Aho/Corasick4. Sparse matrix with Aho/Corasick5. Tries in SFKSearch

4.6.1 Boyer-Moore Multi-pattern String Matching

This is same as what we have already discussed in earlier section 3.1 except that the patternsare quite large in number rather just one. But, matching here is done sequentially for eachpattern.

4.6.2 Wu-Manber Multi-pattern String Matching

This is the default string matching algorithm used in Snort. This was an improvement overBoyer-Moore in 2 aspects (assuming all the patterns are of same length and each patternis broken into further substrings of same length. Eg. if patterns are of length m and k innumber, we form fragments of length b = log(mk), inpractise however b = 2 or 3)

SHIFT Table, which is b-byte shift table preprocessed during initialization (Here allpossible cases of b-string are considered for the given alphabet size). Hence, blocksof characters are matched, by mapping them to unique integral values. It is used to

determine how many characters in the text can be shifted (skipped) when the text isscanned. When a shift value indicates matched fragment of pattern (i.e. value 0),only then pattern matching is done.

Rather than matching all the patterns they have exploited power of hash functions,where initially HASH table is built and all patterns are categorized into appropriatetable entry. Building of hash table is quite interesting here, since the first b-lengthsubstring is considered from the prefix of each pattern for calculation of each hashvalue.

Ambiguity lies in the case where the SHIFT reports a match but there is no entry in the

Hashtable (since the hash only depends on first b-character substring in the pattern), there-fore in that case only 1 character is skipped.

The reported macthing times are nearly two times faster than GNU-grep. The scanningoperation was also shown to be in O(bN/m) where N is the size of input.

4.6.3 Aho-Corasick Multi-pattern Matching

Aho/Corasick Matching, implementation first forms a combined DFA for all patterns. Sincethis is preprocessed during the initialization part, there is no overhead of DFA formation foreach pattern and also no (individual or set of) patterns traversal. And for each new character

we have to just take one step. But the memory overheads are huge. Also, the state holdingat each step is huge because there are multiple copies of active DFAs since a new DFA gets

13


20/40

activated at each new character input other than the existing DFAs. Of course some go outalso but difference is huge.

But power of the algorithm is, it is unaffected by the variance in size of the patterns and

worse and average case performance is same.

4.6.4 Aho/Corasick with Sparse Matrix Implementation

The enhanced design on Aho-Corasick uses an optimized vector storage design for stor-ing the transition table. This memory efficient variant uses sparse matrix storage to reducethe memory requirements and further improve performance on large pattern groups. Theauthor [13] has even reported an 1.2 to 1.7 times faster performance with the usage of sparse-matrix and 1.5-2.5 times with full-matrix version.

Sparse-Row formatVector: 0 0 0 2 4 0 0 0 6 0 7 0 0 0 0 0 0Sparse-Row Storage: 8 4 2 5 4 9 7 11 7

Now for each DFA state rather than having a 256-size vector of which most are 0 values, weuse sparse matrices to present the transition element and the corresponding value. Clearysince we cannot have O(1) transition time in this implementation, since we need to traversethis new vector to find the transition element. The memory requirements go down by fourtimes which is quite significant. There are some other compact representations have alsobeen discussed by the author namely Compressed Sparse Vector Format, Banded-Row For-mat and CSR Matrix Format.

4.6.5 SFKSearch using Tries

The term trie comes from the word retrieval. A trie is a k-ary position tree. It is constructedfrom input strings, i.e. the input is a set ofn strings called S1, S2,...,Sn, where each Si consistsof symbols from a finite alphabet set

and has a unique terminal symbol $. This algorithm

used for low memory situations in Snort. The algorithm builds a trie. Each level in the trieis a sequential list of sibling nodes that contain a pointer to matching rules, a character thatmust be matched to traverse to their child node, and a pointer to the (next) sibling node. Thealgorithm uses a bad character shift table to advance through search text until it encounters

a possible start of a match string, at which point it traverses the trie looking for matches. Ifthere is a match between the character in the current node and the current character in thepacket, the algorithm follows the child pointer and increments the character packet pointer.Otherwise, it follows the sibling pointer until it reaches the end of the list, at which point itrecognizes that no further matches are possible. In the case that matching fails, the algorithm backtracks to the point at which the match started, and now considers matches starting fromthe next character in the packet.

While worst case performance is quite poor in comparison to Aho/Corasick but lowmemory requirements makes it an appropriate substitute at times.

14


21/40

C h a p t e r 5

Bro

Bro[2] is a Unix-based Network Intrusion Detection System (IDS). Bro monitors networktraffic and detects intrusion attempts based on the traffic characteristics and content. Brodetects intrusions by comparing network traffic against rules describing events that aredeemed troublesome. These rules might describe activities (e.g., certain hosts connectingto certain services), what activities are worth alerting (e.g., attempts to a given number ofdifferent hosts constitutes a scan), or signatures describing known attacks or access toknown vulnerabilities. If Bro detects something of interest, it can be instructed to either is-sue a log entry or initiate the execution of an operating system command. The main aim ofthis IDS to combat two major shortcomings of the snort engine namely high false alarm ratesand the string matching time. For the former, they designed the concept of context basedpattern matching, where additional context is provided by

1. Regular expressions for signatures rather strings.2. Providing the alert engine a notion of connection state and knowledge.

In their design, for every matched pattern or rule, rather than generation of an alert, anevent is generated and passed to another component named as policy script component whichat the abstract level sort of correlates these events to find the possibility of an attack. Butmatching large number of patterns each time is quite intensive especially when they havetwo engines running simultaneously. For combating this problem, they have implementedDFA matching for pattern matching algorithm which also provides additional strength totheir patterns since they are more robust to false positives now. Since the DFA construction

requires quite large memory requirements, they have used the approach of on-the-fly gen-eration of the DFA as given in [7] and also implemented the memory bounded DFA enginein-case there is algorithmic attack on the engine itself so that not to affect the other engine.

They have compared their approach with Snort and reported some interesting resultsalso.

The reported matching time was quite similar in without-cache implementation of Broengine and snort.

The number of alerts and signatures in Bro were much more informative as comparedto Snort, eliminating a large number of false positives.

They because of their context-based matching engine has inbuilt capability to fightbackTCP reassembly and fragmentation issues.

15


22/40

Important question is then why snort is the most widely used tool? There is no suchanswer available anywhere but these arguments are just what are my inferences:

With the implementation of efficient string matching algorithms, the running time ofsnort exceeds bro by much large margin.

Snort has large and regularly updated signature database, which is most importantreason for its usage.

Even though Bro signatures are more context-specific, without regular updation ofsignatures and more proper categorization (with the ever increasing signature set), theperformance goes down.

The memory requirements are quite high, since they use a DFA matching engine.

16


23/40

C h a p t e r 6

Issues with Pattern Matching

Other than Pattern matching algorithm decision, there are a lot of other issues that alsoneeds to considered before choosing any one of them. Of course, fast matching is the naturalneed for the decision but there are some other issues to be kept in mind like fighting falsepositives example in some cases it is possible that payload contains a pattern for bufferoverflow attack via telnet application protocol but what if there was no active telnet sessionbetween two hosts. Then, other issue can be what if pattern is split over multiple packets?Some of issues with respect to choice of algorithm and limitations of signature matching hasbeen stated below.

Memory vs Speed Signature format

Session-Based and Application Level Signature Matching State Holding issues in-cases of pattern extending over multiple packets Packet Fragmentation Issues. Getting packet dumps or testing data set? (other than attack tools and DARPA set.)

While one always needs to compromise between memory requirements vs speed avail-able. As we can see in the existing algorithms itself, Aho/Corasick provides O(1) time pat-tern matching but requires quite large memory for the storage of the state machine. Whilethe other string matching algorithms such as Boyer-Moore can lead to O(mn) time require-ments in cases of algorithmic attacks. One must need to payoff one depending upon the

constraints.Most of the IDSs except a few use the byte or character based string as the patterns pre-

sentation format. While this is also needed as the most common algorithms used are Boyer-Moore, KMP etc. But if State-machine matching is being deployed then regular expressioncan provide a better pattern which can be more informative and will be more unique tothe attack it is identifying. Other than these, most of the Snort rules do contains multiplepatterns with different offset and depth values which can be very well expressed in sin-gle regular expression with the usage of basic regular expression patterns like . and * etc.[19] provides some examples also. Also, Bro contains patterns in regex (regular expression)format itself.

Then, [19] also discusses about the statesful packet matching where IDS stores the infor-mation about the context of the traffic between two peers providing more efficient pattern

17


24/40

matching results but the overheads involved are the massive because of the information thatneeds to be stored specific to content of the traffic for large amount of the flows. While overthis, one can also provide application level pattern matching to provide even better results.

One of the most important issues with IDS systems is the state holding issue which canbe explained as the amount of the information that needs to be stored for each flow flowingthrough it. Incase of pattern matching over individual packets, this is not of much concernsince this does not even comes into picture. But with the invent of attack packet split overmultiple packets, pattern matching has gone to name packet stream matching since nowpacket needs to be matched over multiple packets, demanding more memory for storinginformation about session flows and packets flowing, the partially matched patterns, otherflow specific data structures etc. Although there is Snort preprocessor for counter-attack tothis issue namely Stream4, but these issues are with this plugin also. For how much time,does the information needs to stored before dropping the information, (it should not be thecase that IDS declares timeout and drops the session information while the destination hoststill keeps waiting, or vice-versa). Then, what is the number of maximum sessions that canbe stored, since information that needs to be stored can vary from flow to flow.

Continuing the above discussion, issue of fragmented packets [9],[12], [15], [14] evencomplicate the situation more. Since, some of new issues comes into picture like

Out-of-order arrival of TCP segments Re-transmitted segments overlapping TCP packets hence issues with reassembly Missing of fragments in between or losing the state of the connection while connection

is still alive?

How much data should be buffered (TCP window) Varying TTL of the fragments for evasion of NIDS. If the NIDS believes a packet wasreceived when in fact it did not reach the end-system, then its model of the end-systems protocol state will be incorrect. If the attacker can find ways to systematically ensurethat some packets will be received and some not, the attacker may be able to evade theNIDS.

While Authors in [17] has examined the character and effects of fragmented IP traffic asmonitored on highly aggregated Internet links. They had shown the amount of fragmentedpackets in normal internet traffic and their characterizations, classifications as per the statis-tics, protocol and application layer. They show that the amount of fragmented packet

traffic at internet links is less than 1% but there are two cases first they are talking at inter-net level with good connection speeds and secondly, but what if traffic is fragmented attackspecific. These issues pops up some new questions other than existing ones like becausedifferent operating systems have unique methods of fragment reassembly, if an intrusiondetection system uses a single one size fits all reassembly method, it may not reassembleand process the packets the same way the destination host does. An attack that successfullyexploits these differences in fragment reassembly can cause the IDS to miss the malicioustraffic and fail to alert. While much of these have been solved in existing tools heuristically.The above mentioned papers themselves have discussed few of them. Snort even containsa preprocessor plugin i.e. Frag2 for most of these issues with some assumptions like if next

few fragments doesnot arrives in next 30 seconds, it will be dropped, then one can/needsto specify the end hostsystem OS so that specific reassembly is done for that session. Some

18


25/40


26/40

C h a p t e r 7

Issue of Out of Order packets

In the last section, we saw some of the limitations of the existing NIDS systems, Handlingof Out of Order packets is one of them. In most of the current implementations of intrusiondetection systems, out of order packets needs to stored unless all the fragments/segmentshas been received. Then the packet is re-assembled and transmitted to the destination. Now,since it involves temporary storage of the fragments, one can easily evade the IDS by con-stant bombardment of the never ending fragments. Currently the IDS handle this issue bylimiting the number of fragments per flow and also setting timeout value for each of thefragmented packet so that it will be dropped as soon as timeout time is passed after thepacket arrival.

Since, logging the packets (hence the blockage of network buffers) also affects the other

modules of the system, we propose a solution such that one need not store the out-of-orderpacket i.e. as we keep getting fragments they are pushed to the destination instantly. But,we made an assumption that is fragment size should always be greater than the largestsignature in the signature set.

7.1 Solution

Consider the Aho-Corasick algorithm of pattern matching, which involves making a definitefinite automata of the signature set and then traversing this DFA (we call this simply, DFA)

for the incoming traffic payload. Now, consider another DFA, lets call it RDFA. Define anew signature set which is formed by reversing all of the signatures of the original signatureset. Now, RDFA is constructed similarly to the original DFA just that new signature setgenerated in last step is used.

How this works

We claim that using the two DFAs we would be able to do the matching (assumed aboveassumption), without storing the fragmented packets. For each of the input packet payload,do the transitions on the original DFA and for reverse of the packet payload, do the transition

jumps on the RDFA. Now, store pointers for the intermediate state for both of these DFAs.(which is stored anyways in the stream based pattern matching methodology). When the

20


27/40

next fragment comes, we move on the respective DFAs from the stored states. Now, thereare the following possible cases:

Fragments are in order

We have seen fragments upto sequence n and now some fragment of sequence n + i(where i > 1) arrives.

We have seen fragments with sequence number n, n+2. Now comes the fragment withsequence number n + 1.

Implications of the Assumption

Our assumption that fragment payload size will be greater than largest signature size from oursignature set, implies that no signature (if it exists in the flow) will be extended across morethan 2 fragments.Proof: Lets say that the largest signature size is k, then fragment payloadsize >= kbytes,now if a pattern starts matching at place i in fragment 0 ( even n can be taken, 0 is usedwithout loss of generality) , it can maximum go upto index i + k, and total packet size of twofragments will be >= 2k. Since i


28/40

Reverse Signature Set = {olleh, ehs}Stream flow (payloads of packets) = {whatshel, lomg}

Now, DFAs will be as shown in figures below:

Figure 7.2: DFA and RDFA (respectively) for the above example.

Now, if first packet comes first, then DFA will report the match for signature1 and will be instate 3, while RDFA will be in state 0 itself. As second packet arrives, DFA will reportanother match as it crosses o in the payload and at the end it will be in state 0. Otherwise,if second packet comes first, then DFA will be in state 0 while state RDFA will be in state 2.And as the first packet arrives, RDFA will report the matches for both the signatures andends up in state 0 as the DFA.

7.3 Limitations

Assumption (this is also a limitation), while [17] shows that fragmented packets arequite less and also that our assumption will be true in most of the cases but we regardthis is as a limitation

Snort with one DFA ends up using around 58MB of memory of the DFA, now withtwo DFAs this almost double. So, the tradeoff of network buffers goes into therequirement of more memory.

We looked at different ways of optimizing the huge memory requirements by our proposedsolution, like merging the 2 DFAs or keeping 2 transition tables rather than 2 DFAs, usingsuffix trees etc. but none of them worked, some are inefficient in terms of speed ofmatching while some can lead to wrong results.

22


29/40

C h a p t e r 8

Anomaly Detection

Anomaly detection is a key element of intrusion detection in which perturbations ofnormal behavior suggest the presence of intentionally or unintentionally induced attacks,faults, defects etc. Anomaly detection approaches build models of normal data and detectdeviations from the normal model in observed data. Anomaly detection applied tointrusion detection and computer security has been an active area of research since it wasoriginally proposed in 87. Most anomaly detection algorithms require a set of purelynormal data to train the model, and they implicitly assume that anomalies can be treated aspatterns not observed before. Since an outlier may be defined as a data point which is verydifferent from the rest of the data, based on some measure (which can be distance based orthe density based), this field has seen application of large number of clustering algorithms

from the fields of databases and data mining being employed here.Some of the commonly employed algorithms belonging to this class are Nearest Neighbor(NN), Distance to the k-th Nearest Neighbor, Mahalanobis-distance Based Outlier, DensityBased Local Outliers (LOF)[3], Unsupervised Support Vector Machines, Balanced IterativeReducing and Clustering using Hierachies (BIRCH). While [10] has discussed all this ingood detail with their relative pros and cons and also their performances on the DARPA [8]as well as real data set, which indicated that LOF is the best among all these approaches.Even one of the most anomaly detection tool namely MINDS [6]has also used thisapproach in their NIDS.

Some popularly used Anomaly Detection Tools/Products:

MINDS (Minnesota Network Intrusion Detection System) using the LOF approach forlearning model.

ADWICE [4] in collaboration with SafeGaurd using the BIRCH clustering algorithm

LANCOPE (a commercial Behavioral Network Anomaly Detection product)

SPADE plug-in, for the open source IDS Snort, inspects recorded data for anomalousbehavior based on a computed score.

23


30/40

8.1 Approaches to Anomaly Detection

Anomaly Detection involves two parts namely building the normal profile of the network

and scoring the new flows on the scale 0 to 100 (0 being normal, 100 being anomalous).Building of normal profile can be done in one of the two ways:

1. Using one of the clustering algorithms like BIRCH etc. where all the learning datapoints are clustered first and then when new data comes, it is tested for possbility todrop into one of the clusters else declared as outlier.

2. Using measures such as Local Outlier Factors, Nearest Neighbour etc. can be used.Here, rather than clustering the data points, some features or statistics are calculatedover each of the points and then when the new data arrives, it is matched with thenearest (which is also defined by these measures) data points and scored as normal or

anomalous.

Latter techniques have been mainly deployed in the anomaly detection systems at systemlevel where intrusion on a host is prime concern. Some have tried to use them at networklevel also. But they have shortcoming that they require to store all the data points from thelearning data and when new data points arrive, heavy computations on both of the existingdata and new data point are required to get good results. Efficient data structures aredeployed to prevent these computations for the existing data points but for atleast newdata points they are quite heavy.But, former approach that is clustering for anomaly detection, we would look at some of

the algorithms and their scalability issues in the next chapter.Some of the limitations of anomaly detection systems being:

1. They work best when you have properly labelled normal data. Now, normal isdefined as the regular traffic features of the network. Gathering or Capturing thenormal data for a network is not feasible always because of variety of reasons liketoplogy of network, what if some attack or scan going on while assuming normaldata etc.

2. Since, these are based on statistical analysis, false positives rate are much higher thanpattern matching techniques.

3. Algorithms or techniques used are not scalable and fast enough to keepup with thegigabit networks requirements of these days. Not fast enough because the statisticalprocessing involves heavy computions on each of the incoming packet and with largefeature set (which implies large dimension data set) makes computation even moreexpensive. Scalability is an issue since these systems depend on the network trafficbehavior and we have networks today which have diverse and different requirementsat times.

4. Selection of features for defining the network behavior from the packets is still

developing. The proper set which can be said to properly and completely define thenetwork behavior is still not available.

24


31/40

5. Application level exploits at network anomaly detection systems are still indeveloping phase (i.e. no product does this), hence any new buffer overflow, sqlinjection or any such exploits are still undetectable by these systems. (Since most

commonly defined features capture the network behavior from the headers or theflags of the packets)

6. Lack of adaptiveness of changing network behavior

People do try to provide solutions for various shortcomings like in case of the normal data,one can use the pattern matching engine to detect the network attacks, scans etc if anygoingon. And then collect the data for large periods of time since networks may havedifferent requirements at different times of day or different times of weekdays.

Event correlation engines are developed for correlating the various events/alerts after athreshold or the rule is found violated. But, rather looking for the superset of misusedetection to be able to detect every intrusion, people rather looked for removing thelimitations which gave rise to anomaly detection techniques, which are able to detect newintrusions and donot suffer from large signature set issues (since it doesnot uses anysignature set).

People have tried to work on the expensive(in cpu and memory terms) and time takingcomputation of these systems. Even applications of techniques such as SVD (Singularvalue decomposition), PCA (Principal Component Analysis) etc which reduce thedimensions of the data in such a way that results does not vary much even if complete

dimensions were chosen. Also, clustering algorithms has been proposed which works viaexploring the dense sub-dimensions of the data rather working on the large data set inlarge dimensions and results are positive.

ADWICE has looked at the last flaw and even developed a system which is adaptive tonetwork behavior. Even their clustering algorithm in very popular scalable clusteringalgorithm from the databases, lets look at the some of the clustering algorithms with theirminuses and positives. But we know that when looking for the clustering algorithm foranomaly detection we are looking for one with the following properties:

Clustering should be unique, implies that clusters returned as output should beindependent of the order of the input data points.

Clustering should be as accurate as possible and accuracy with distance basedapproach comes at the cost of time while density based approach (here tradeoff islarge memory requirements), provides much accuracy and can be deployed toconsider all cases i.e. data has all equal sized clusters, data has some very denseclusters while some sparse, data has some dense and small clusters while some largeand sparse and vice-versa etc.

Clustering Algorithm should be adaptive that is new data points can be feeded intothe appropriate clusters and/or clusters can be modified even in testing time.

25


32/40

Should be able to classify new input points as normal as anomalous efficiently andfastly (keeping in mind gigabit requirements)

(This is optional) But having an Memory and space efficient clustering algorithmwould be helpful to convert the product into an inline one.

26


33/40

C h a p t e r 9

Clustering Algorithms for Anomaly

Detection

While a lot of research has been going in the field of databases for clustering and datamining for various large, scalable and efficient clustering algorithms but here in anomalydetection we have additional requirement ofspeed which should be much faster than thosein databases.Clustering Algorithms can be broadly classified into three parts:

1. Partitioning Algorithms (eg. K-Means)

2. Grid Based Algorithms (eg. CLIQUE)

3. Hierarchical Algorithms

Agglomerative (eg. BIRCH, DBSCAN etc)

Divisive

Partitioning Approaches tries to optimize an function such that the space is divided into thek-partitions and each point is in the best possible partition. While Grid based approaches,slice the n-dimension into the small cells and then forming the dense clusters (this evenhelps to work with dimension reducibility).Hierarchical clustering algorithms groups data points into the same cluster initially and then

keep partitioning them as some dense clusters start forming (divisive approach), andvice-versa for the Agglomerative approach. Lets look at some of the clustering algorithms.

9.1 BIRCH - Balanced Iterative Reducing and Clustering

BIRCH [21] is one of the fastest running clustering algorithms with an order of O(N). It isdivided into 4 phases out of which last three are optional and are just for fine tuning thephase 1 clustering. These constants to be feeded to the algorithm:

T, that is the size of cluster.

B, the branching factor of the tree.

27


34/40

P, the memory size available to this process.

L, the maximum number of clusters at each leaf node.

It maintains a binary Tree type Tree structure with each node having maximum of B childs.All the clusters are at the leaf nodes of the tree. Now initially the tree is empty and let sayT=0, as new data points keep coming, it traverses the tree to find the appropriate leaf nodewhere it can fit into, and then it looks for the perfect match in each of the clusters in thatleaf node. If it can fit into any of the clusters, then it is inserted there else a new cluster isformed. The fitting of the data point is defined by distance based measure (which can bemanhattan, euclidean etc) and the cluster statistics are updated after insertion. If formationof cluster increases the leaf child count by L, then the leaf is splitted into 2 leaves with aparent above them and clusters are designated to the appropriate leaf nodes. Also, atsometime if memory cap i.e. P is reached, then the T is increased so that the cluster sizes

are increased and more points can be fitted into each of the clusters, henceforth reducingthe cluster count freeing up some memory.Positive Points of this algorithm:

Running time is O(n) which is much better compared to other algorithms, alsoadditional phases cleans up some of the errors from phase 1.

Memory efficient (hence easily be built as a inline product)

Because of efficient tree data structure, classifying new data points is easier.

Negative Points of this algorithm:

The clustering is not unique, i.e. the clustering results depends upon the order of thedata points. This is because it is possible that two small dense clusters can be joinedto form one cluster if the data points occur alternatively from two clusters or in somesame order (assumed that T is large enough to encapsulate both the clusters)

Uses distance based measures for all calculations which are known to be less accuratewhen clusters with different densities and sizes exists.

Some data points may be classified to wrong clusters because of the limitations ofdistance based calculations in measurement.

Requires large numbers of the input params.

Clusters formed are spherical, may lead to large false positives.

9.2 DBSCAN - Density-Based Algorithm for Discovering

Clusters in Large Spatial Databases with Noise

DBSCAN [11] an O(N log N) time clustering algorithm has density as the similaritymeasure between data points rather than the distance based formulas. It just iterates once

over all the data points for all the clusters with addition time oflog N in each step makes itan O(Nlog N) algorithm. But, compared to BIRCH, it has just two input parameters namely

28


35/40


36/40

C h a p t e r 10

ADWICE-TRAD

ADWICE[4] is an adaptive anomaly detection algorithm which uses BIRCH[21] as theclustering algorithm for learning the normal data and then classifying the new data asanomalous or normal. As already seen in the last section, BIRCH suffers from a lot ofshortcomings. Here we tried to reduce the number of false positives by modifying thethreshold calculation and cluster bounds. In original original algorithm of BIRCH uses aconstant same threshold for each of the clusters (named T) which increases whenever werun out of given amount of memory so as to merge some clusters and free some memory.

Fixing the same threshold for all clusters is unfair for many of them. For example considera cluster, with all points near the center of cluster and clusters threshold T. This cluster

can include some of the bad points which are near the boundary. Hence, fixing the samethreshold for all clusters is not fine rather it should depend on the cluster properties likepoints distribution, density of the cluster etc. Hence, we propose a density basedmechanism for the deciding the cluster size and threshold which we name asADWICE-TRAD.

BIRCH uses distance based measures for clustering algorithm. According to which, allclusters have the same threshold size, T. For a new point inclusion into a cluster, itsdistance from the center of the cluster has to be less than T. So, define inclusion region asthe spherical region of radius T around the center of cluster. Currently, inclusion regionis independent of the current density of the cluster and same for all clusters.

But if, a cluster is dense, inclusion region should be less and should be dependent on thecurrent radius of the cluster rather than some predefined fixed threshold. While for sparsecluster, inclusion region should be relatively large.So, the inclusion of the new point in a cluster should be dependent on the density of thecluster (i.e. the number of points in cluster and its current radius). Mathematically, themeasurements will be made on the basis of two more variables t and R where both theterms has been explained below.

R

(additional statistical variable need to be stored with each cluster Cluster featureset) is different for each of the clusters and depends on the current number of points

30


37/40

in it and its current Radius (R(CFi)

R(CFi) = R(CFi) (1 + c/fn(n, d))

here,

d = dimension of the data points.n = number of points inside this cluster.

f n(n, d) = some function of n and dc = some constant.

i.e. R = its current radius + current radius multiplied by some constant and dividedby some function of n.The function f n can be log

d(n) or just log(n). So, threshold requirement should be

R(CFi)


38/40


39/40

Bibliography

[1] http://afrodita.unicauca.edu.co/cbedon/snort/spp kickstart.html.

[2] http://bro-ids.org.

[3] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, and Jrg Sander. Lof:identifying density-based local outliers. In SIGMOD 00: Proceedings of the 2000 ACMSIGMOD international conference on Management of data, pages 93104, New York, NY,USA, 2000. ACM Press.

[4] Kalle Burbeck and Simin Nadjm-Tehrani. Adwice - anomaly detection with real-timeincremental clustering. In ICISC, pages 407424, 2004.

[5] T. Corman, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press, 1990.

[6] L.Ertoz E., Eilertson A., A. Lazarevic, P. Tan, J. Srivastava, Kumar V., and P. Dokas. TheMINDS - Minnesota Intrusion Detection System, in Next Generation Data Mining. MIT/AAAI Press, 2004.

[7] J. Heering, P. Klint, and J. Rekers. Incremental generation of lexical scanners. ACMTrans. Program. Lang. Syst., 14(4):490520, 1992.

[8] http://www.ll.mit.edu/IST/ideval/. DARPA Intrusion Detection Evaluation, 1999.

[9] C. A. Kent and J. C. Mogul. Fragmentation considered harmful. WRL Technical Report87/3, 1987.

[10] Aleksandar Lazarevic, Aysel Ozgur, Levent Ertoz, Jaideep Srivastava, and VipinKumar. A comparative study of anomaly detection schemes in network intrusiondetection. In SIAM International Conference on Data Mining, 2003.

[11] Ester M., Kriegel H.-P., and Xu X. Sander J. A density-based algorithm for discovering

clusters in large spatial databases with noise. In Proc. 2nd int. Conf. on KnowledgeDiscovery and Data Mining (KDD 96). AAAI Press, 1996.

33


40/40

[12] C.Kreibich M.Handley and V.Paxon. Network intrusion detection: Evasion, trafficnormalization, and end-to-end protocol semantics. Proc.of the 10th USENIX SecuritySymposium (Security 01), 2001.

[13] Marc Norton. Optimizing pattern matching for intrusion detection, 2004.

[14] Judy Novak. Target-based fragmentation reassembly, April, 2005.

[15] Thomas H. Ptacek and Timothy N. Newsham. Insertion, evasion, and denial ofservice: Eluding network intrusion detection. Technical report, Secure Networks, Inc.,Suite 330, 1201 5th Street S.W, Calgary, Alberta, Canada, T2R-0Y6, 1998.

[16] Shai Rubin, Somesh Jha, and Barton P. Miller. Automatic generation and analysis ofnids attacks. In ACSAC 04: Proceedings of the 20th Annual Computer SecurityApplications Conference (ACSAC04), pages 2838, Washington, DC, USA, 2004. IEEE

Computer Society.

[17] C. Shannon, D. Moore, and K. Claffy. Characteristics of fragmented ip traffic oninternet links, 2001.

[18] R. Sidhu and V. Prasanna. Fast regular expression matching using fpgas, 2001.

[19] R. Sommer and V. Paxson. Enhancing byte-level network intrusion detectionsignatures with context, 2003.

[20] N. Tuck, T. Sherwood, B. Calder, and G. Varghese. Deterministic memoryefficient

string matching algorithms for intrusion detection, 2004.[21] Tian Zhang, Raghu Ramakrishnan, and Miron Livny. BIRCH: an efficient data

clustering method for very large databases. pages 103114, 1996.

methods of impvin string effciency

Documents