a framework for an adaptive intrusion

A Framework for an Adaptive IntrusionDetection System using Bayesian Network

Farah JemiliDr. Montaceur ZaghdoudPr. Mohamed Ben Ahmed

RIADI Laboratory, ENSI, Manouba UniversityManouba 2010, Tunisia

Abstract-The goal of a network-based intrusion detection II. INTRUSION DETECTION SYSTEMsystem (IDS) is to identify malicious behavior that targets a There are two general methods of detecting intrusions intonetwork and its resources. Intrusion detection parameters arenumerous and in many cases they present uncertain and computer and network systems: anomaly detection andimprecise causal relationships which can affect attack types. A signature recognition [2]. Anomaly detection techniquesBayesian Network (BN) is known as graphical modeling tool used establish a profile of the subject's normal behavior (normto model decision problems containing uncertainty. In this paper, profile), compare the observed behavior of the subject with itsa BN is used to build automatic intrusion detection system based norm profile, and signal intrusions when the subject'son signature recognition. The goal is to recognize signatures of observed behavior differs significantly from its norm profile.known attacks, match the observed behavior with those knownsignatures, and signal intrusion when there is a match. A major Signature recognition techniques recognize signatures ofdifficulty of this system is that intrusions signatures change over known attacks, match the observed behavior with those knownthe time and the system must be retrained. An IDS must be able signatures, and signal intrusions when there is a match.to adapt to these changes. The goal of this paper is to provide a An IDS installed on a network is like a burglar alarmframework for an adaptive intrusion detection system that uses system installed in a house. Through various methods, bothBayesian network. detect when an intruder/burglar is present. Both systems issueIndex Terms Adaptive intrusion detection, bayesian network, some type of warning in case of detection of presence oflearning algorithm, learning dataset, inference. intrusion/burglar.

Systems which use misuse-based techniques contain aI. INTRODUCTION number of attack descriptions, or 'signatures', that are

Jntrusion detection can be defined as the process of matched against a stream of audit data looking for evidence of

identifying malicious behavior that targets a network and its the modeled attacks. The audit data can be gathered from the

resources [1]. network, from the operating system, or from application logMalicious behavior is defined as a system or individual files [2]. Experimentation conducted in this research work is

., based on DARPA KDD'99 data set.action which tries to use or access to computer system withoutauthorization (i.e., crackers) and the privilege excess of those III DARPA'99 DATA SETwho have legitimate access to the system (i.e., the insiderthreat). MIT Lincoln Lab's DARPA intrusion detection evaluationThe proliferation of heterogeneous computer networks has datasets have been employed to design and test intrusion

serious implications for the intrusion detection problem. detection systems. The KDD 99 intrusion detection datasetsForemost among these implications is the increased are based on the 1998 DARPA initiative, which providesopportunity for unauthorized access that is provided by the designers of intrusion detection systems (IDS) with anetwork's connectivity. benchmark on which to evaluate different methodologies

Intrusion detection is not an easy task due to the vastness of [3][12][13].the network activity data and the need to regularly update the To do so, a simulation is made of a factitious militaryIDS to be adapted to unknown attack methods. network consisting of three 'target' machines running various

Nowadays, completely protect a network from attacks is operating systems and services. Additional three machines arebeing a very hard task. Even heavily protected networks are then used to spoof different IP addresses to generate traffic.sometimes penetrated, and an Adaptive Intrusion Detection Finally, there is a sniffer that records all network traffic using

the TCP dump format. The total simulated period is sevenSystem seems to be essential and is a key component in weeks [13]. Packet information in the TCP dump file iscomputer and network security. summarized into connections. Specifically, "a connection is a

sequence of TCP packets starting and ending at some welldefined times, between which data flows from a source IPaddress to a target IP address under some well definedprotocol" [13].

1-4244-1330-3/07/$25.OO 02007 IEEE. 66

DARPA KDD'99 dataset represents data as rows of TCP/IP IV. BAYESIAN NETWORKdump where each row consists of computer connection which A Bayesian network is a graphical modeling tool used tois characterized by 41 features. model decision problems containing uncertainty. It is a

Features are grouped into four categories: directed acyclic graph where each node represents a discrete* Basic Features: Basic features can be derived random variable of interest. Each node contains the states of

from packet headers without inspecting the the random variable that it represents and a conditionalpayload. probability table (CPT) which give conditional probabilities of

* Content Features: Domain knowledge is used to this variable such as realization of other connected variables,assess the payload of the original TCP packets. based upon Bayes rule:This includes features such as the number of failed P ( B / A) P (A / B ) P (B) 7login attempts; P (A )

* Time-based Traffic Features: These features are The CPT of a node contains probabilities of the node beingdesigned to capture properties that mature over a 2 in a specific state given the states of its parents. The parent-second temporal window. One example of such a child relationship between nodes in a Bayesian networkfeature would be the number of connections to the indicates the direction of causality between the correspondingsame host over the 2 second interval; variables. That is, the variable represented by the child node is

* Host-based Traffic Features: Utilize a historical causally dependent on the ones represented by its parentswindow estimated over the number of connections [4][14].- in this case 100 - instead of time. Host based Several researchers have been interested by using Bayesianfeatures are therefore designed to assess attacks, network to develop intrusion detection systems. Axelsson inwhich span intervals longer than 2 seconds. [5] wrote a well-known paper that uses the Bayesian rule of

In this study, we used KDD'99 dataset which is counting conditional probability to point out the implications of thealmost 494019 of training connections. Based upon a base-rate fallacy for intrusion detection. It clearlydiscriminate analysis, we used data about only important demonstrates the difficulty and necessity of dealing with falsefeatures (the 9th first features): alerts.

* Protocol type: type of the protocol, e.g. tcp, udp, Kruegel in [1] presented a model that simulates anetc. intelligent attacker using Bayesian techniques to create a plan

* Service: network service on the destination, e.g., of goal-directed actions. An event classification scheme ishttp, telnet, etc. proposed based on Bayesian networks. Bayesian networks

* Land: 1 if connection is from/to the same improve the aggregation of different model outputs and allowhost/port; 0 otherwise. one to seamlessly incorporate additional information.

* Wrongfragment: number of wrong" fragments. Johansen in [6] suggested that a Bayesian system whichNum,jailedlogins: number of failed login provides a solid mathematical foundation for simplifying a

attempts. seemingly difficult and monstrous problem that today's* Logged in: 1 if successfully logged in; 0 Network IDS fail to solve. He added that Bayesian Network

otherwe.- ' IDS should differentiate between attacks and the normalnetwork activity by comparing metrics of each network traffic* Root_shell: 1 ifroot shell is obtained; 0 otherwise. sample.

* Is guest login: 1 if the login is a "guest" login; 0otherwise. V. BAYESIAN NETWORK LEARNING ALGORITHM

To these features, we added the "attack type". Indeed eachtraining connection is labelled as either normal, or as an attack Methods for learning bayesian graphical models can bewith specific type. partitioned into at least two general elasses of methods:DARPA'99 base counts 38 attacks which can be gathered in constraint-based search and Bayesian methods.

four main categories: The constraint-based approaches [7] [8] search the data for* Denial of Service (dos): Attacker tries to prevent conditional independence relations from which it is in

legitimate users from using a service, principle possible to deduce the Markov equivalence class of* Remote to Local (r21) Attacker does not have an the underlying causal graph. Two notable constraint basedaemoutoteviLoctimmr2l)Attache, hoesnce triesto algorithms are the PC algorithm which assumes that no hiddenaigain variables are present and the FCI algorithm which is capableaccess.* User to Root (u2r): Attacker has local access to of learning something about the causal relationships even

assuming there are latent variables present in the data [7].rivieges. Bayesian methods [9] utilize a search-and-score procedure* Prbe:Attakertrie togaininfrmaton boutthe to search the space of DAGs, and use the posterior density as a

target host. ~~~~~~scoring function. There are many variations on Bayesiantargethost. ~~~~~methods, however, most research has focused on theapplication of greedy heuristics, combined with techniques to

67

avoid local maxima in the posterior density (e.g., greedy IfPnew > Pold Thensearch with random restarts or best first searches). Pold := Pnew,;

Both constraint-based and Bayesian approaches have pai(x) : =pai(x1) U /z4;advantages and disadvantages. Constraint-based approaches Else OK:=false;are relatively quick and possess the ability to deal with latentvariables. However, constraint-based approaches rely on an We ordered network variables as follows: protocole_type,arbitrary significance level to decide independencies. sevice, land, wrongfragment, numfailed logins, logged in,

Bayesian methods can be applied even with very little data root-shell, is_guest_login, attack type.where conditional independence tests are likely to break down. We had chosen the number 4 as the upper limit of nodeBoth approaches have the ability to incorporate background parents. Learning result is shown in Figurel:knowledge in the form of temporal ordering, or forbidden orforced arcs. Also, Bayesian approaches are capable of dealingwith incomplete records in the database. The most seriousdrawback to the Bayesian approaches is the fact that they are R U0

relatively slow.In this paper, we are dealing with incomplete records in the

database so we opted for the Bayesian approach andparticularly for the K2 algorithm.K2 learning algorithm showed high performance in many

research works. The principle of K2 algorithm, proposed byfCooper and Herskovits, is to define a database of variables:xi, ...,I Xn, and to build an acyclic graph directed (DAG) basedon the calculation of local score [10]. Variables constitutenetwork nodes. Arcs represent "causal" relationships between - JFO1flSvariables.

Algorithm K2 used in learning step needs:* A given order between variables roo "a JSizegI* and the number of parents, u of the node.

K2 algorithm proceeds by starting with a single node (thefirst variable in the defined order) and then incrementally adds a-c °connection with other nodes which can increase the wholeprobability of network structure, calculated using the g Fig. 1. Bayesian Network

function. A requested new parent which does not increase VI. JUNUIUN I KLL INILKLINUL AUUKI I HtVInode probability can not be added to the node parent set. The most common method to perform discrete exact

qi (r ri inference is the Junction Tree algorithm developed by Jenseng(xi,pai(xi)) 171

(r - 1)! LIJNijk![1]j= i( i 1 k=1 The idea of this procedure is to construct a data structure

Where, for each variable xi; ri is the number of possible called a junction tree which can be used to calculate any queryinstantiations; N is the number of cases in the database; wi is through message passing on the tree.the j-th instantiation of pai in the database; qi is the number of The first step of JT algorithm creates an undirected graphpossible instantiations for pai; Nijk is the number of cases in D from an input DAG through a procedure called moralization.for which xi takes the value xik with pai instantiated to wij; Moralization keeps the same edges, but drops the direction,Nij is the sum ofNijk for all values of k. and then connects the parents of every child. Junction tree

Execution time is in the order O(Nu2n2r) with r being the construction follows four steps:maximum value for ri [10]. * JT Inference Stepl: Choose a node ordering.

A. K2 Algorithm Note that node ordering will make a difference inInput:A set of variables xl,... x,~ a given order among the topology of the generated tree. An optimalInput:~ ~ ~ > > node ordering with respect to the junction tree iS

them, an upper limit u on the number of parents for a node, a NP-hard to find.database on x1,..., xn.

Output: a DAG with oriented arcs. * JT Inference Step2: Loop through the nodes inFor i* Ito n do the ordering. For each node Xi, create a set Si ofpai(xJ =0; OK: true; all its neighbours. Delete the node Xi from thePold:= g(x1, pa('x1)); moralized graph.While OKand pa1(x1) < u do * JT Inference Step3: Build a graph by letting eachLet z be the node in the set ofpredecessors of xi that does not Si be a node. Connect the nodes with weightedbelong topa1(x1) which maximizes g(x1,pa4'x1) K) /); undirected edges. The weight of an edge goingPnew:= g(xi, pai(xi) U) z}); from Si to Sj is ISiOnSj 1

68

* JT Inference Step4: Let the junction tree be the can be classified into different classes. Classes of knownmaximal-weight spanning tree ofthe cluster graph. attacks and another class that will be defined for the

connections not initially registered in the learning dataset andVII. PROBLEM DESCRIPTION have been recognized by the system as intrusions.

A major shortcoming of current IDSs that employ Bayesian Therefore, the system will also be able to provide us with

network is that they can give a series of false alarms in cases the attack types of known intrusions. This is very important inof a noticeable systems environment modification. There can order to be able to take the necessary security measures.be two types of false alarms in classifying activities in case of Figure 2 presents an architecture for our framework:any deviation from normal patterns: false positives and falsenegatives.

False positive alarms are issued when normal behaviors are (2 classes: learningincorrectly identified as abnormal and false negative alarms Normal/Intrusion datasetare issued when abnormal behaviors are incorrectly identifiedas normal. Though it is important to keep both types of falsealarm rates as low as possible, the false negative alarms shouldbe the minimum to ensure the security ofthe system. LearningprocessTo overcome this limitation, an IDS must be capable of (K2 algorithm)

adapting to the changing conditions typical of an intrusion ..detection environment. It is important that an IDS should have , above thr-sold,automatic adaptability to new conditions. Otherwise, an IDS Bayesian noupdvmay start to lose its edge. Such adaptability can be achieved BaNetwork lPlby using incremental dataset which is automatically updated.

The goal of an IDS based on signature recognition is torecognize signatures of known attacks, match the observedbehavior with those known signatures, and signal intrusion New Junction Tree Connectionwhen there is a match. A major difficulty of this system is that connection Inference recognitionintrusions signatures change over the time and the systemmust be retrained. An IDS must be able to adapt to these Fig. 2. The framework for the adaptive intrusion detection systemchanges. The goal of this paper is to provide a framework foran adaptive intrusion detection system that uses Bayesian IX. EXPERIMENTATION RESULTSnetwork. The dataset used for experimentation is DARPA KDD'99

which contains signatures of normal connections andVIII.EAFRAMEWORKFORANADAPTIVECINTIONOSYsignatures of 38 known attacks gathered in four main classes:

dos, r21, u2r and probe. The class "other" contains detectedIn this section we propose a framework for an adaptive intrusions that are not initially registered in KDD'99 dataset.

intrusion detection system using bayesian network. The The main criterion that we have considered in thelearning dataset can be updated by adding new intrusions experimentation of our system is the detection rate.signatures. Detection rate is defined as the number of examples

Learning dataset contains signatures of normal connections correctly classified by our system divided by the total numberand signatures of several types of known attacks. The oftest examples.framework process begins with classifying connections of the 1) First step: Connections in DARPA'99 are classified intolearning dataset into 2 classes: Normal/Intrusion by using the 2 classes; each connection of the DARPA'99 dataset isassociations' rules. This will accelerate both the learning labeled as either normal or intrusion.process from the dataset and the deduction process about any TABLE Inew connection (normal or intrusion). DETECTION RATE (1)

In the case of a connection which is not registered in the Connection Detectionlearning dataset, the system will find low probability degrees(below threshold) for this connection to be either normal or Normalintrusion. Normal connections signatures being registered in (58714) 87.68 %the dataset, we are going to consider intrusion every Intrusionconnection that is not labeled as normal in the dataset. Thus, (61960) 88.64%the suspect connection will be added to the learning dataset

and be labeled as intrusion. ~2) Second Step: Intrusions ot the L)AI{t-A'99 dataset can beThe class Intrusion of the learning dataset iS therefore an clsife in 4'nw lse:DS,Poig 2 n

U2R. The class "other" iS defined for the connections notincremental class, automatically updated by ouosstm.intilystm reitrdi.AP'9 aae n aebeIn a second step, intrusions of the updated learning dataset reonzdaitusnsbthsyem

69

TABLE2 [10] Sanguesa R., Cortes U. Learning causal networks from data: a surveyDETECTION RATE (2) and a new algorithm for recovering possibilistic causal networks. Al

Intrusion Detection Communications 10, 31-61, 1997.[11] Jensen Frank, Jensen Finn V. and Dittmer Soren L. From influence

diagrams to junction trees. Proceedings ofUAI, 1994.DOS [12] ISTG. The 1998 intrusion detection off-line evaluation plan. MIT

(61960) 88.64% Lincoln Lab., Information Systems Technology Group, 1998.htt:/ww.lI .mit.edu/IST/ideva1/docs/ 1 998/id98-eval 11 .txt

Probing 99.15% [13] Kayacik, G. H., Zincir-Heywood, A. N. Analysis of Three Intrusion(827) . Detection System Benchmark Datasets Using Machine Learning

R2L Algorithms, Proceedings of the IEEE ISI 2005 Atlanta, USA, May,(30) 20.88 % 2005.(30) [14] Peter Spirtes, Thomas Richard-son, and Christopher Meek. LearningU2R Bayesian networks with discrete variables from data. In Proceedings of(15) 6.66% the First International Conference on Knowledge Discovery and Data

Mining, pages 294-299, 1995.Other [15] A. Valdes and K. Skinner. Adaptive, Model-based Monitoring for Cyber(200) 66.5l1o Attack Detection. In Proceedings of RAID 2000, Toulouse, France,

October 2000.Table 1 snows a nig;n perrormance OI our system In [16] David Maxwell Chickering. Learning equivalence classes of Bayesian

detection ofNormal and Intrusion connections. network structures. In Proceedings of the Twelfth Annual Conference onTable 2 shows a high performance of our system in Uncertainty in Artificial Intelligence (UAI-96), pages 150-57, Portland,Table 2 sowsa hig performace of oursystem in

Oregon, 1996.detection of DOS, Probing and Other connections. The low [17] David Heckerman. Bayesian networks for data mining. Data Mining andperformance of our system in detection of R2L and U2R Knowledge Discovery, 1998.connections may be explained by the low proportions of the [18] F. Jemili, M. Zaghdoud and M. Ben Ahmed. DIDFAST.BN: Distributed

R2L and U2Rtraining connections. Intrusion Detection and Forecasting Multiagent System using BayesianR2L and U2R training connections. Network. In Proceedings of the ICTTA'06 International Conference onInformation & Communication Technologies: from theory to

X. CONCLUSION applications, 2006.[19] N. Arfaoui, F. Jemili, M. Zaghdoud and M. Ben Ahmed. Comparative

In this paper, we outlined a framework for an adaptive Study between Bayesian Network and Possibilistic Network in Intrusionintrusion detection system using bayesian network. Bayesian Detection. In Proceedings of the SECRYPT'06 International Conferencenetworks provide automatic detection capabilities, they learn on Security and Cryptography, 2006.from audit data and can detect both normal and abnormalconnections.

Our system demonstrated a high performance whendetecting Intrusions. This system can be improved byintegrating an expert system which is able to providerecommendations based on attack types.

Another alternative consists of using possibilistic networksrather than bayesian networks to better qualitatively representintrusion risk evaluation.

REFERENCES[1] Kruegel Christopher, Darren Mutz William, Robertson Fredrik Valeur.

Bayesian Event Classification for Intrusion Detection Reliable SoftwareGroup University of California, Santa Barbara,, 2003.

[2] Brian C. Rudzonis. Intrusion Prevention: Does it Measure up to theHype? SANS GSEC Practical vl.4b, 2003.

[3] DARPA. Knowledge Discovery in Databases, 1999. DARPA archive.Task Descriptionhttp://www.kdd.ics.uci.edu/databases/kddcuip99/task.htm

[4] Jensen F. Bayesian Networks and Decision Graphs. Springer, NewYork, USA, 2001.

[5] Axelsson S. The Base-Rate Fallacy and its Implications for theDifficulty of Intrusion Detection. In 6th ACM Conference on Computerand Communications Security, 1999.

[6] Johansen Krister and Lee Stephen. Network Security: Bayesian NetworkIntrusion Detection (BNIDS) May 3, 2003.

[7] Peter Spirtes, Clark Glymour, and Richard Scheines. Causation,Prediction, and Search. Springer Verlag, New York, 1993.

[8] Thomas S. Verma and Judea Pearl. Equivalence and synthesis of causalmodels. In P.P. Bonissone, M. Henrion, L.N. Kanal, and J.F. Lemmer,editors, Uncertainty in Artificial Intelligence 6, pages 255-268. ElsevierScience Publishers B.V. (North Holland), 1991.

[9] Gregory F. Cooper and Edward Herskovits. A Bayesian method for theinduction of probabilistic networks from data. Machine Learning, 1992.

70

a framework for an adaptive intrusion

Documents