[ieee 2008 33rd ieee conference on local computer networks (lcn 2008) - montreal, qb, canada...

8
A two-stage aggregation/thresholding scheme for multi-model anomaly-based approaches Karim Tabia CRIL CNRS UMR 8188 Artois University France Email: [email protected] Salem Benferhat CRIL CNRS UMR 8188 Artois University France Email: [email protected] Yassine Djouadi Computer Science Department Tizi Ouzou University Algeria Email: [email protected] Abstract—This paper deals with anomaly score aggregation and thresholding in multi-model anomaly-based approaches which require multiple detection models and profiles in order to characterize the different aspects of normal activities. Most works focus on profile/model definition while critical issues related to anomaly measuring, aggregating and thresholding have not received similar attention. In this paper, we in particular address the issue of anomaly scoring and aggregating which is a recurring problem in multi-model anomaly-based approaches. We propose a two stage aggregation/thresholding scheme particularly suitable for multi-model anomaly-based approaches. The basic idea of our scheme is the fact that anomalous behaviors induce either intra- model anomalies or inter-model ones. Our scheme is designed for real-time detection of both intra-model and inter-model anomalies. More precisely, we propose local thresholding in order to detect intra-model anomalies and use a Bayesian network in order to, on one hand, extract inter-model regularities and serve, on the other hand, as an aggregating function for computing the overall anomaly score associated with each analyzed audit event. Our experimental studies, carried out on recent and real http traffic, show for instance that most Web-based attacks induce only intra-model anomalies and can be effectively detected in real-time. Moreover, this scheme significantly improves the detec- tion rate of Web-based attacks involving inter-model anomalies. I. I NTRODUCTION Intrusion detection [2] is concerned with real time or off- line analysis of the events happening in information systems in order to reveal unauthorized resources use or access attempts or any suspicious activity. Intrusion detection systems (IDSs) are either misuse-based such as SNORT [19] or anomaly- based such as EMERALD [17] or a combination of both the approaches in order to exploit their mutual complemen- tarities [22]. The first approach is based on known attack signatures whereas the latter focuses on the normal activity profile. Misuse-based IDSs are known to achieve very high detection rates on known attacks but they are ineffective in detecting new ones. Anomaly approaches build profiles or models representing normal behaviors and detect intrusions by comparing current system activities with learnt profiles. In practice, anomaly-based IDSs are efficient in detecting new attacks but cause high false alarm rates which really encumbers the application of anomaly-based IDSs in real environments. In fact, configuring anomaly-based systems to acceptable false alarm rates result in failure to detect most malicious activities. However, the main advantage of anomaly detection lies in its potential capacity to detect both new and unknown (previously unseen) as well as known attacks. Several anomaly-based systems use models and profiles [13] [20] [17] [14] to represent normal behaviors of networks, hosts, users, programs, etc. In most anomaly-based IDSs, anomaly score relative to a given audit event (network packet, system call, application log record, etc.) often depends on several local deviations measuring how much anomalous is the audit event with respect to the different normal profiles and detection models [14]. Critical issues in anomaly detection are normal profile/model definition and anomaly scoring and thresholding. The first issue is concerned with extracting and selecting the best features to analyze in order to effectively detect anomalies. The second issue is also critical as it provides the anomaly scores determining whether audit events should be flagged normal or anomalous. The problem of bad tradeoffs between detection rates and underlying false alarm ones characterizing most anomaly- based IDSs are in part due to problems in anomaly measuring, aggregating and thresholding methods. In this paper, we first address drawbacks of existing methods for anomaly score mea- suring, aggregating and thresholding. Then, we propose a two stage scheme for anomaly score aggregation and thresholding suitable for multi-model anomaly-based approaches. The pro- posed scheme aims at effectively detecting both intra-model anomalies and inter-model ones. This scheme combines local thresholding in order to detect intra-model anomalies with a Bayesian network that is used on one hand to learn inter- model anomalies and compute the overall anomaly scores on the other hand. The proposed scheme is particularly designed to overcome most existing methods’ drawbacks. We carried out experimental studies on real and recent http traffic and several Web-based attacks. The rest of this paper is organized as follows: Section II briefly presents anomaly scoring and points out drawbacks of existing methods for anomaly measuring, aggregating and thresholding. A two stage scheme for anomaly score aggregation and thresh- olding is proposed in section III. In section IV, we detail the Bayesian network-based scheme for aggregating local anomaly scores. Section V presents our experimental studies. Finally, section VI concludes this paper. 919 978-1-4244-2413-9/08/$25.00 ©2008 IEEE

Upload: yassine

Post on 16-Dec-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2008 33rd IEEE Conference on Local Computer Networks (LCN 2008) - Montreal, QB, Canada (2008.10.14-2008.10.17)] 2008 33rd IEEE Conference on Local Computer Networks (LCN) - A

A two-stage aggregation/thresholding scheme formulti-model anomaly-based approachesKarim Tabia

CRIL CNRS UMR 8188Artois University

FranceEmail: [email protected]

Salem BenferhatCRIL CNRS UMR 8188

Artois UniversityFrance

Email: [email protected]

Yassine DjouadiComputer Science Department

Tizi Ouzou UniversityAlgeria

Email: [email protected]

Abstract—This paper deals with anomaly score aggregationand thresholding in multi-model anomaly-based approacheswhich require multiple detection models and profiles in order tocharacterize the different aspects of normal activities. Most worksfocus on profile/model definition while critical issues relatedto anomaly measuring, aggregating and thresholding have notreceived similar attention. In this paper, we in particular addressthe issue of anomaly scoring and aggregating which is a recurringproblem in multi-model anomaly-based approaches. We proposea two stage aggregation/thresholding scheme particularly suitablefor multi-model anomaly-based approaches. The basic idea of ourscheme is the fact that anomalous behaviors induce either intra-model anomalies or inter-model ones. Our scheme is designedfor real-time detection of both intra-model and inter-modelanomalies. More precisely, we propose local thresholding in orderto detect intra-model anomalies and use a Bayesian network inorder to, on one hand, extract inter-model regularities and serve,on the other hand, as an aggregating function for computing theoverall anomaly score associated with each analyzed audit event.Our experimental studies, carried out on recent and real httptraffic, show for instance that most Web-based attacks induceonly intra-model anomalies and can be effectively detected inreal-time. Moreover, this scheme significantly improves the detec-tion rate of Web-based attacks involving inter-model anomalies.

I. INTRODUCTION

Intrusion detection [2] is concerned with real time or off-line analysis of the events happening in information systems inorder to reveal unauthorized resources use or access attemptsor any suspicious activity. Intrusion detection systems (IDSs)are either misuse-based such as SNORT [19] or anomaly-based such as EMERALD [17] or a combination of boththe approaches in order to exploit their mutual complemen-tarities [22]. The first approach is based on known attacksignatures whereas the latter focuses on the normal activityprofile. Misuse-based IDSs are known to achieve very highdetection rates on known attacks but they are ineffective indetecting new ones. Anomaly approaches build profiles ormodels representing normal behaviors and detect intrusionsby comparing current system activities with learnt profiles.In practice, anomaly-based IDSs are efficient in detecting newattacks but cause high false alarm rates which really encumbersthe application of anomaly-based IDSs in real environments.In fact, configuring anomaly-based systems to acceptable falsealarm rates result in failure to detect most malicious activities.

However, the main advantage of anomaly detection lies in itspotential capacity to detect both new and unknown (previouslyunseen) as well as known attacks.Several anomaly-based systems use models and profiles [13][20] [17] [14] to represent normal behaviors of networks,hosts, users, programs, etc. In most anomaly-based IDSs,anomaly score relative to a given audit event (network packet,system call, application log record, etc.) often depends onseveral local deviations measuring how much anomalous isthe audit event with respect to the different normal profilesand detection models [14]. Critical issues in anomaly detectionare normal profile/model definition and anomaly scoring andthresholding. The first issue is concerned with extracting andselecting the best features to analyze in order to effectivelydetect anomalies. The second issue is also critical as it providesthe anomaly scores determining whether audit events shouldbe flagged normal or anomalous.The problem of bad tradeoffs between detection rates andunderlying false alarm ones characterizing most anomaly-based IDSs are in part due to problems in anomaly measuring,aggregating and thresholding methods. In this paper, we firstaddress drawbacks of existing methods for anomaly score mea-suring, aggregating and thresholding. Then, we propose a twostage scheme for anomaly score aggregation and thresholdingsuitable for multi-model anomaly-based approaches. The pro-posed scheme aims at effectively detecting both intra-modelanomalies and inter-model ones. This scheme combines localthresholding in order to detect intra-model anomalies with aBayesian network that is used on one hand to learn inter-model anomalies and compute the overall anomaly scores onthe other hand. The proposed scheme is particularly designedto overcome most existing methods’ drawbacks. We carriedout experimental studies on real and recent http traffic andseveral Web-based attacks.The rest of this paper is organized as follows: Section II brieflypresents anomaly scoring and points out drawbacks of existingmethods for anomaly measuring, aggregating and thresholding.A two stage scheme for anomaly score aggregation and thresh-olding is proposed in section III. In section IV, we detail theBayesian network-based scheme for aggregating local anomalyscores. Section V presents our experimental studies. Finally,section VI concludes this paper.

919978-1-4244-2413-9/08/$25.00 ©2008 IEEE

Page 2: [IEEE 2008 33rd IEEE Conference on Local Computer Networks (LCN 2008) - Montreal, QB, Canada (2008.10.14-2008.10.17)] 2008 33rd IEEE Conference on Local Computer Networks (LCN) - A

II. MULTI-MODEL ANOMALY-BASED APPROACHES

Multi-model anomaly-based approaches rely on severalmodels/profiles in order to represent the different character-istics of normal activities of users, programs, hosts, etc. Forinstance, authors in [14] use six detection models in orderto characterize the different aspects of http requests. Hence,anomaly scoring functions evaluating the deviation of a givenaudit event1 with respect to learnt models/profiles first com-putes local deviations of this audit event with each detectionmodel/profile and aggregates the obtained local deviations inorder to compute an overall anomaly score which will be usedto decide whether the event in hand is normal or anomalous.Designing a multi-model anomaly-based approach needs todesign the following elements:

1) Profiles/Detection models: Effective detectionmodels/profiles ideally consist in ”all” features/aspectsthat can show differences between normal activities andabnormal ones. Note that most common form of auditevents used in statistical-based IDSs are multivariateaudit records describing network packets, connections,system calls, application log records, etc. These auditrecords involve different data types among whichcontinuous and categorical data are common. Note thatin order to characterize normal activities, detectionmodels/profiles often take into account the deployedsecurity policy and historical data. For instance, ananomaly-based approach may involve a detectionmodel/profile to reflect the fact that the security policystates that files containing confidential data must not beaccessed by way of Web accesses. As for historical data,it allows to extract regularities and trends characterizingpast activities. For example, Web server log records areused to learn long term models/profiles characterizingrequest lengths, character distributions, etc.

2) Anomaly scoring measures They are functionscomputing anomaly scores for every analyzed event.According to a fixed or learned threshold, an anomalyscore associated with an event allows flagging itnormal or anomalous. To compute such anomalyscores, anomaly scoring measures use the followingfunctions:

a) Set of ”individual” (or ”local”) anomaly scoringmeasures: They are functions that evaluate thenormality of audit event with respect to normalprofiles individually. For example, in [15] threestatistical profiles represent normal http and DNSrequests: Request type profile, Request length pro-file and Character distribution profile. Then threeanomaly scoring functions are used in order to

1According to intrusion detection field, an audit event can be a packet ora connection in case of network-oriented intrusion detection, a system logrecord in case of host-oriented intrusion detection, a Web server log recordin case of Web-oriented intrusion detection, etc.

compute local anomaly scores. Most used anomalymeasures are distance measures (which are widelyused in outlier detection [1] and clustering [7]),probability measures [20], density measures [6]and entropy measures [16]. Note that a multi-model anomaly-based approach may use differentanomaly measures within the same system.

b) Aggregating functions: Aggregating functions areused to fuse all individual anomaly scores intoa single anomaly score which will be used todecide whether the analyzed event is normal oranomalous. Namely, a global anomaly score ASfor an audit event E is computed using aggregatingfunction G which aggregates all local anomalyscores ASMi

(E) relative to corresponding pro-files/models Mi.

AS(E) = G(ASM1(E), ASM2(E).., ASMn(E))

(1)In practice, aggregating functions range fromsimple sum-based methods [10][15] to complexmodels such as Bayesian networks [12][20].

3) Anomaly thresholding Thresholding is needed totransform a numeric anomaly score into a symbolicvalue (Normal or Abnormal) such that an alert can beraised. Namely, thresholding is done by specifying valueintervals for both normal and anomalous behaviors.It is important to note that only few works addressedanomaly thresholding issues [18]. In fact, some authorsjust use a single value [15][20] to fix the border-linebetween normal and abnormal behaviors while othersuse range of values to fix this limit and flag eventsas Normal, Abnormal or Unknown. In practice,thresholds are often fixed according to the false alarmrate which must not be exceeded. Thresholds canbe statically or dynamically set. The advantage ofdynamically fixing a threshold is the ability to reassignits value in such a way to limit the amount of triggeredalerts.

Figure 1 shows a typical architecture of a multi-modelanomaly-based approach.

Fig. 1. A typical multi-model anomaly-based approach

920

Page 3: [IEEE 2008 33rd IEEE Conference on Local Computer Networks (LCN 2008) - Montreal, QB, Canada (2008.10.14-2008.10.17)] 2008 33rd IEEE Conference on Local Computer Networks (LCN) - A

It is clear that the effectiveness of anomaly-based ap-proaches strongly depend on profile/model definition andanomaly scoring measure relevance. In order to illustrate ourideas, we use a simple but widely used Web-based anomalyapproach developed by Kruegel & V igna [13]. These authorsproposed a multi-model approach to detect Web-based attacksrelying on six detection models (Attribute length, Characterdistribution, Structural inference, Token finder, Attribute pres-ence or absence and Attribute order). During detection phase,the six models output anomaly scores which are aggregatedusing a weighted sum. Recently, this model has been examinedin depth in [9].

A. Drawbacks of existing schemes for anomaly measuring andaggregating

Most multi-model anomaly-based approaches use sum-based aggregating functions [12]. Such methods are ”simplis-tic” and ineffective. In fact, most existing aggregating methodssuffer from several problems:

1) Weighting local anomaly scores is often done in a”questionable” way. For example, authors in [15] neitherexplained how they assign weights nor why they usesame weighting for http and DNS requests.

2) The accumulation phenomena which causes severalsmall local anomaly scores to cause, once summed, ahigh global anomaly score.

3) The averaging phenomena which causes a very highlocal anomaly score to cause, once aggregated, a lowglobal anomaly score.

4) Commensurability problems are encountered when dif-ferent detection model outputs do not share the samescale. Then some anomaly scores will have much moreimportance in the overall score than the others.

5) Ignoring inter-model dependencies existing between thedifferent detection models: Most used aggregating func-tion assume that detection models are independent. Thisresults in wasting valuable information that can beexploited to detect anomalies.

6) Real-time detection capabilities: The decision of raisingan alert is taken on the basis of the global anomalyscore which requires computing all local anomaly scoresthen aggregating them. This method causes severalproblems especially for effectiveness considerations. Forexample, when analyzing buffer-overflow attacks, therequest length can be sufficient and there is not need tocompute the other anomaly scores. Moreover, in buffer-overflow attacks, the request is often segmented overseveral packets which are reassembled at the destinationhost. However, such attack can be detected given thefirst packets of the request and there is not need to waitfor all packets in order to detect such an anomaly.

7) Handling missing inputs: Missing data is an importantissue that existing systems have not dealt withconveniently. In fact, many intentional or accidentalcauses can provoke the missing of some data pieces.For example, in gigabyte networks, network packet

sniffer may drop packets. Though, when applied tonetwork traffic, how can the model proposed in [15]deal with a request if the sniffer dropped the packetcontaining the request method? The problem is how toanalyze audit events given that some inputs are missing.

In the following, we propose an anomaly score aggrega-tion/thresholding scheme particularly designed to overcomethe limitations of existing methods.

III. A TWO STAGE SCHEME FOR ANOMALY SCOREAGGREGATION AND THRESHOLDING

In this section, we detail the new scheme for aggregat-ing anomaly scores and thresholding particularly suitable formulti-model anomaly-based approaches.

A. Anomalous behaviors

The premise of anomaly-based approaches is the assumptionthat attacks induce abnormal behaviors. There are differentpossibilities about how anomalous events affect and manifestthrough elementary features. For instance, anomalous eventscan be in the form of an anomalous (new or outlier) valuein a feature, anomalous combination of normal values oranomalous sequence of events. Accordingly, alerts raised bya multi-model anomaly-based approach can be caused by twoanomaly categories:• Intra-model anomalies: They are anomalous behaviors

affecting one single model. Namely, the anomalyevidence is obvious only through one detection model.For example, in Kruegel & V igna model, there arebuffer-overflow attacks which heavily affect the lengthmodel without affecting the other models. Similarly,directory traversal and cross site scripting attacksonly affect the character distribution model. Then, theanomaly score computed using the affected model shouldsuffice in order to detect such attacks.

• Inter-model anomalies: They are anomalies that affectregularities and correlations existing between differentmodels. In Kruegel & V igna model, authors pointedout correlations between Length model and Characterdistribution one. Then audit events violating suchregularities are possibly anomalous. For example, ina Web-based multi-model approaches using Requestlength and Attribute number detection models, thereare inter-model regularities between the Length modeland the Argument Number model. Generally, when anhttp request has an oversized length, it often involvesa large number of arguments and parameters. However,a large request without any argument is potentially abuffer-overflow attack.

It is obvious that intra-model anomalies can be detectedwithout aggregating the different anomaly scores. Moreover,this is interesting because such anomalies can be detected inreal-time.

921

Page 4: [IEEE 2008 33rd IEEE Conference on Local Computer Networks (LCN 2008) - Montreal, QB, Canada (2008.10.14-2008.10.17)] 2008 33rd IEEE Conference on Local Computer Networks (LCN) - A

B. Local and global thresholdingGiven that anomalous events can either affect detection

models individually or violate regularities existing betweendetection models, then we propose a two-stage thresholdingscheme aiming at raising an alert whenever an anomalousbehavior occurs be it intra-model or inter-model. In fact, anyanomaly revealed by a detection model is sufficient to raisean alert even if other detection models have not yet returnedtheir anomaly scores. This is the idea motivating the two-stagethresholding scheme. Namely, each detection model has itsown anomaly threshold TMi

. During the detection phase, onceinput data for detection model Mi is available, the system cantrigger an alert whenever anomaly score AsMi(E) exceedsthe corresponding threshold TMi

. If no intra-model anomalyis detected, then we need to look for inter-model anomalies.• Intra-model thresholding: In order to detect intra-model

anomalies, we fix for each detection model Mi a localanomaly threshold in the following way:

ThresholdMi= (MaxEi∈TS(AsMi

(Ei)) ∗ θ (2)

Threshold ThresholdMi associated with detection modelMi is set to the maximum among all anomaly scorescomputed on normal audit events Ei involved in trainingset TS. θ denotes a discounting/enhancing factor inorder to control the detection rate and the underlyingfalse alarm rate. In case when no intra-model anomalyis detected, then we need to check for inter-modelanomalies.

• Global thresholding: Similarly to intra-model threshold-ing, a threshold can be fixed for global anomaly score asfollows:

Threshold = (MaxAi∈TS(As(Ai)) ∗ θ (3)

Note that term AS(Ai) denotes the output of theanomaly score aggregating function while term Ai

denotes a training anomaly score record. In order tocontrol the detection rate/false alarm rate tradeoff, onecan use the discounting/enhancing parameter θ.

Note that the motivation of setting the anomaly thresholds tothe maximum among all anomaly scores computed on normaltraining behaviors is to detect any event whose anomaly scoreexceeds all normal behavior scores used to build the detectionmodels. This maximum-based thresholding is intuitive anddoes not require any assumption about anomaly scores.In fact, the greatest anomaly score on training behaviorsis the one associated with normal but unusual behavior.Then behaviors having greater anomaly score are possiblyanomalous.

Local and global thresholding can be combined in order toexploit their advantages:• Real-time detection: With local thresholding, every intra-

model anomaly is detected without waiting for otherdetection model results.

• Handling missing inputs: Missing inputs only affect mod-els requiring these input. Then remaining models canwork normally and detect intra-model anomalies.

• Intra-model and inter-model anomaly detection: As wewill see in experimental studies, combining local andglobal thresholding allows detecting more effectivelyboth intra-model and inter-model anomalies.

IV. BAYESIAN NETWORK-BASED AGGREGATION

In this section, we first show how to learn inter-modelregularities from empirical normal training data. Then weshow how to use the learnt Bayesian network in order tocompute overall anomaly scores associated with audit events.

A. Bayesian networks

Bayesian networks are powerful graphical models forrepresenting and reasoning under uncertainty conditions[11]. They consist of a graphical component DAG (DirectedAcyclic Graph) and a quantitative probabilistic one. Thegraphical component allows an easy representation ofdomain knowledge in the form of an influence network(vertices represent events while edges represent ”influence”relationships existing between these events). The probabilisticcomponent expresses uncertainty relative to influencerelationships between domain variables using conditionalprobability tables (CPTs). Learning Bayesian networksrequires training data to learn the network structure andcompute the conditional probability tables. Note that thestructure can be specified by the expert according to hisdomain knowledge.Several works used Bayesian networks for anomaly detection[8][20][23]. For instance, authors in [12] used a Bayesiannetwork in order to assess the anomalousness of systemcalls. In our case, main advantages of Bayesian networks arelearning capabilities in order for instance to automaticallyextract inter-model regularities and inference capacitieswhich are very effective. Moreover, Bayesian networks cancombine user-supplied structure with empirical data. Theother interesting advantage of using a Bayesian networkis to use it as an aggregating function which takes intoaccount inter-model regularities and naturally copes with bothweighting and commensurability problems. In fact, usinga Bayesian network to evaluate the normality of a givenanomaly score record does not require any transformation onthe anomaly score record to be analyzed.

B. Training the Bayesian network: Extracting inter-modelregularities

In order to automatically extract inter-model regularitiesfrom a normal training set, we train a Bayesian network onhistorical normal audit events. It is important to note thatin order to learn these regularities, we need to train theBayesian network not directly on audit events (which maybe unstructured data such as raw http requests) but on the

922

Page 5: [IEEE 2008 33rd IEEE Conference on Local Computer Networks (LCN 2008) - Montreal, QB, Canada (2008.10.14-2008.10.17)] 2008 33rd IEEE Conference on Local Computer Networks (LCN) - A

anomaly score records relative to training audit events (outputof different detection models).

1) Training set for learning inter-model regularities: Herewe explain the process of preparing the training set that willbe used to learn inter-model regularities:Given a data set of m normal audit events Ei, we buildan equivalent data set of m normal anomaly score recordsA1,A2,..,Am where each record Ai is composed of all localanomaly scores (namely Ai = (ai1, .., ain) correspondingto local anomaly scores relative to normal audit event Ei

with respect to detection models M1,..,Mn and anomaly mea-sure AsM1 ,..,AsMn

respectively). Then learning a Bayesiannetwork from these anomaly vectors will learn inter-modelregularities involved in the training set. Once inter-modelregularities learnt, it is possible to detect audit events thatviolate these regularities.As it is shown on Figure 2, normal training audit events arefirst transformed into normal anomaly score records compos-ing the training set. Using this latter, a Bayesian network islearnt using a Bayesian network learning algorithm such asK2 [4]:

Fig. 2. Training set preparation

Figure 2 shows the training data preparation step wherenormal audit events (in this example, http requests) aretransformed into anomaly score records. Namely, for eachtraining http request, there will be n local anomaly scorescorresponding to the deviations of this request from then detection models (each training http request will berepresented by its anomaly score record). Henceforth, thisdata set can be used for training a Bayesian network inorder to automatically learn inter-model regularities existingbetween the selected detection models.

2) Learning inter-model regularities from normal trainingdata: Structure learning from empirical data is an activeresearch topic. It is concerned with learning the best structure

(graph or network topology) given a data set of training exam-ples [11]. There are two categories of Bayesian network learn-ing algorithms: scoring-based structure learning algorithmswhich search for the structure that best fits the training data bymaximizing a score such as the Bayesian [5] and MDL [21]scores, and IC-based (Conditional Independence) algorithmswhere the structure is learnt by searching for conditionalindependence relationships existing between domain variables.The learnt network structure represents inter-model regulari-ties while conditional probability tables quantify inter-modelinfluences. Ideally, all strong correlations existing betweenvariables representing the detection models are extracted fromtraining data. For example, Figure 3 shows a Bayesian networkleant from the normal anomaly score records corresponding toa set of attack-free http requests:

Fig. 3. Example of a Bayesian network showing extracted inter-modelregularities

Note that the Bayesian network of Figure 3 is built withthe well-known K2 algorithm [4] using a hill-climbing searchstrategy. Each node in this network represents a detectionmodel (in this example, there are 9 detection models designedto reveal Web-based anomalies) while arcs represent influencerelationships existing between these detection models. Someof these relationships are natural and intuitive such as therelationship existing between the number of parameters andthe number of arguments (in training set, these numbers arein most cases equal) and the relationship between the requestlength and the URL length (the URI is a part of an httprequest). Other influence relationships are less intuitive butthey are empirically significant correlations.

3) Computing global anomaly score using the Bayesiannetwork: Once the Bayesian network built, it can be usedto compute the probability of any anomaly score record. Wefirst compute the different anomaly scores then using theBayesian network, we compute the probability of the currentanomaly vector. Semantically, the normality of audit event Ei

is proportional to the probability of the corresponding anomalyvector. In fact, anomaly score records which are similar tothe training ones will be associated with high probabilities

923

Page 6: [IEEE 2008 33rd IEEE Conference on Local Computer Networks (LCN 2008) - Montreal, QB, Canada (2008.10.14-2008.10.17)] 2008 33rd IEEE Conference on Local Computer Networks (LCN) - A

while anomaly score records which significantly differ fromtraining anomaly vectors will be associated with zero or verylow probabilities.Using a Bayesian network as an aggregating function hasseveral advantages. First, inter-model regularities are takeninto account when computing the overall anomaly score.Moreover, the Bayesian network-based aggregation overcomesproblems related to commensurability and weighing issues.Since the Bayesian network is learnt on normal anomaly scorerecords, then whatever are the outputs of the different localdetection models, they are easily and efficiently aggregatedby the Bayesian network.Using a Bayesian network, the probability of a given anomalyscore record Ai=(Ai1, .., Ain) is computed using the chainrule [11] as follows:

p(Ai) =∏

j=1..n

p(Aij/parent(Aij)), (4)

where term parent(Aij) denotes parent nodes of variableAi in the Bayesian network. Using the learnt structure andconditional probability tables (CPTs), it is easy and rapid tocompute the probability of any anomaly score record whichis proportional to normality of the audit event in hand. Let usnow see how to set the global anomaly thresholds.From an anomaly-based approach point of view, everyanomaly score exceeding all normal training anomaly scoresshould trigger an alert. Then, global anomaly threshold can befixed as follows:

Threshold = (MaxAi∈TS(1− p(Ai)) ∗ θ (5)

Term p(Ai) in Equation 5 denotes the probability degreecomputed using the Bayesian network. This threshold flagsanomalous any event having a probability degree smaller thanthe most improbable training event involved in training setTS. Term θ denotes a discounting/enhancing factor allowingto control the detection rate/false alarm rate tradeoff.

V. EXPERIMENTAL STUDIES

In order to evaluate our aggregating/thresholding scheme,we use a multi-model approach designed to detect Web-basedanomalies. We selected a subset of detection models andfeatures that we designed to detect attacks against server-sideand client-side Web applications [3]. The selected detectionmodels are built on real and recent attack-free http traffic andevaluated on real and simulated http traffic involving normaldata as well as several Web-based attacks.

A. Detection model definition

In [3], we proposed a set of detection models and classi-fication features including basic features of http connectionsas well as derived features summarizing past http connec-tions. Note that detection model’s inputs are directly extractedfrom network packets instead of using Web application logs.Processing whole http traffic is the only way for detectingsuspicious activities and attacks targeting either server-side or

client-side Web applications. The detection model features aregrouped into four categories [3]:

1) Request general features: They are features that pro-vide general information on http requests. Examples ofsuch features are request method, request length, etc.

2) Request content features: These features search forparticularly suspicious patterns in http requests. Thenumber of non printable/metacharacters, number of di-rectory traversal patterns, etc. are examples of featuresdescribing request content.

3) Response features: Response features are computed byanalyzing the http response to a given request. Examplesof these features are response code, response time, etc.

4) Request history features: They are statistics aboutpast connections given that several Web attacks suchas flooding, brute-force, Web vulnerability scans,etc. perform through several repetitive connections.Examples of such features are the number/rate ofconnections issued by same source host and requestingsame/different URIs, inter-request time interval, etc.

Note that in our experimentations, we selected the detectionmodels of Table I.

TABLE ISELECTED DETECTION MODELS

Name Description TypeReq-length Request length IntegerURI-length URI length IntegerReq-method Request method (GET, POST HEAD...) NominalReq-resource-type Type of requested resource ( html, cgi,

php, exe, ...)Nominal

Num-param Number of parameters IntegerNum-arg Number of arguments IntegerNum-NonPrintChars

Number of special and meta-charactersand shell codes in the http request(x86, carriage return , semicolon...)

Integer

Resp-Code Response code to http request (200,404, 500...)

Nominal

Response-time Time elapsed since the correspondinghttp request

Real

Script-type The type of script included in the re-sponse (Java, Visual basic, ...)

Nominal

Num-Req-Same-Host

Number of requests issued by samesource host during last 60 seconds

Integer

Num-Req-Same-URL

Number of requests with same URLduring last 60 seconds

Integer

Num-Req-Same-Host-Diff-URI

Number of requests issued by samesource and requesting different URLsduring last 60 seconds

Integer

Inter-Req-Interval Inter request time interval RealHttp-Error-Rate Rate of http responses with error codes

during last 60 secondsReal

In our experimentations, numeric features are modeledby their means µ and standard deviations σ while nominalfeatures are represented by the frequencies of possible values.

B. Local anomaly scoring measures

During the detection phase, the anomaly score associatedwith a given http connection lies in the local anomaly scoresof the connection features with respect to the learnt detectionmodels. We use different anomaly measures according to eachprofile type(numeric, nominal) and its distribution in training

924

Page 7: [IEEE 2008 33rd IEEE Conference on Local Computer Networks (LCN 2008) - Montreal, QB, Canada (2008.10.14-2008.10.17)] 2008 33rd IEEE Conference on Local Computer Networks (LCN) - A

data. It is important to note that most numeric features intraining set have rather exponential distributions than Gaussianones. For example, Figure 4 shows the distribution of URI-length and Num-Req-Same-Host features in the training setwe used in our experimentations.

Fig. 4. Example of training distributions

Figure 4 clearly shows that distribution of URI-lengthand Num-Req-Same-Host features are not Gaussian. Indeed,distribution of URI-length involves large numbers of normalrequests where the length of the URI is very low. As forNum-Req-Same-Host’s distribution, Figure 4 shows that mostnormal requests are characterized by values near to zeromeaning that most source hosts do not issue several successivehttp requests to the same destination host within the selectedtime window (in our experimentations, we used a 60 sec-onds time window for computing Request history features).These finding suggest that abnormal behaviors such as buffer-overflow attacks and vulnerability scans or flooding will causevery large values. Accordingly, local anomaly measures usedin order to compute local anomaly score takes into account thedistribution type empirically obtained from normal training set.In order to compute the anomaly score of a given feature Fi

with respect to the corresponding detection model Mi, weconsider two cases:• if Fi is numerical then the anomaly score is computed as

follows:AsMi(Fi) = e

Fi−µiσi (6)

Terms µi and σi denote respectively the mean and stan-dard deviation of feature Fi in normal data. σi is usedas a normalization parameter. Note that only exceedingvalues cause high anomaly scores. Intuitively, if the valueof Fi is less, equal or closer to the average µi then theanomaly score will be negligible. Otherwise, the widerthe margin, the greater will the anomaly score.

• if Fi is a nominal feature then the anomaly score iscomputed according to the improbability of the value ofFi in normal training data. Namely,

AsMi(Fi) = −log(p(Fi)) (7)

Term p(Fi) denotes the frequency of Fi’s value in normaltraining data. Intuitively again, the more exceptional(unusual) is the value of Fi in training data, the higherwill be the anomaly score. Conversely, frequent and usualvalues will be associated with low anomaly scores.

C. Training and testing data

Our experimental studies are carried out on a real httptraffic collected on a University campus during 2007. Note

that this traffic includes both inbound and outbound httpconnections. We extracted http traffic and preprocessed it intoconnection records using only packet payloads. As for attacks,we simulated most of the attacks involved in [9] which is toour knowledge the most extensive and up-to-date open Web-attack data set.

TABLE IITRAINING/TESTING DATA SET DISTRIBUTION

Training data Testing dataClass Number % Number %

Normal connections 55342 100% 61378 58.41%Buffer overflow – – 18 0.02%

Value misinterpretation – – 2 0.001%Poor management – – 3 0.001%

Flooding – – 12485 11.88%Vulnerability scan – – 31152 29.64%

Cross Site Scripting – – 6 0.01%SQL injection – – 14 0.01%

Command injection – – 9 0.01%Other input validation – – 46 0.04%

Total 55342 100% 105084 100%

Note that in order to train the Bayesian network, normalhttp requests (second column in Table II) are first transformedinto a training set composed of the anomaly score recordscorresponding to these training requests. As for attacks ofTable II, they are categorized according to the vulnerabilitycategory involved in each attack. Regarding attacks effects,attacks of Table II include denial of service, vulnerabilityscans, information leak, unauthorized and remote access [9].It is important to note that attacks involved in Table II arethe most frequent Web-based attacks during 2007, the yeartraining and testing sets are built.

D. Comparison of thresholding and aggregation schemes

Table III compares on one hand results of a sum-basedaggregation using a single global threshold with local thresh-olding solely then with a sum-based aggregation combinedwith local and global thresholding. On the other hand, weevaluate the Bayesian network-based2 approach using a singleglobal threshold and the combination of the local and globalthresholding with Bayesian network-based aggregation. Notethat all the anomaly thresholds are computed on normaltraining data and we do not use any discounting/enhancingparameter θ (θ=1).

Firstly, Table III shows that our scheme perform better thanthe reference sum-based scheme. Moreover, it is importantto note that most attacks induce only intra-model anomaliesand can be detected without any aggregation. In fact, thecombination of sum-based scheme with local thresholdingsignificantly enhances the detection rates without triggeringhigher false alarm rates. Similarly, Bayesian network-basedaggregation using global thresholding achieves better resultsregarding detection rates and false alarm rate. Note that

2Note that structure learning is performed using the K2 algorithm [4]

925

Page 8: [IEEE 2008 33rd IEEE Conference on Local Computer Networks (LCN 2008) - Montreal, QB, Canada (2008.10.14-2008.10.17)] 2008 33rd IEEE Conference on Local Computer Networks (LCN) - A

TABLE IIIEVALUATION OF DIFFERENT AGGREGATION/THRESHODLING SCHEMES ON

http TRAFFIC

Sum BayesSum- aggreg+ aggreg+based local local Bayes local

Audit event class aggreg thresh thresh aggreg threshNormal connections 99.94% 97.37% 97.37% 99.79% 99.66%

Buffer overflow 16.67% 94.44% 94.44% 27.78% 94.44%Value misinterpretation 100% 100% 100% 50% 100%

Poor management 100% 100% 100% 66.67% 100%Flooding 95.46% 99.62% 99.62% 86.22% 99.93%

Vulnerability scan 0.00% 51.84% 51.84% 83.06% 90.56%Cross Site Scripting 0.00% 100% 100% 100% 100%

SQL injection 0.00% 100% 100% 100% 100%Command injection 0.00% 100% 100% 100% 100%

Other input validation 2.17% 86.96% 86.96% 23.91% 91.30%Total 69.72% 84.16% 84.16% 93.20% 97.02 %

the best results are achieved by the Bayesian network-basedaggregation combined with local and global thresholding (seedetection rates over normal connections and Web attacks). Thisis due to the fact that this scheme detects both intra-model andinter-model regularities learnt by the Bayesian network.

VI. CONCLUSION

This paper dealt with anomaly scoring, thresholding and ag-gregating issues in multi-model anomaly detection approaches.We proposed a two-stage aggregating/thresholding schemesuitable for detecting in real-time intra-model and inter-modelanomalies. The basic idea of our scheme is that anomalousbehaviors either affect a single detection model or violatesregularities existing between the different detection models.The proposed scheme combines local thresholding in order todetect in real-time intra-model anomalies and global thresh-olding in order to detect inter-model regularities. These latterare directly extracted from attack-free training data using aBayesian network which is used during the detection phaseto compute the overall anomaly score associated with eachanalyzed audit event. Experimental studies, carried out on realand recent http traffic, showed that most Web-related attacksonly induce intra-model anomalies and can be detected in real-time using local thresholding. Future works will explore theapplication of our scheme in order to detect anomalies andattacks when input data is uncertain or missing.

VII. ACKNOWLEDGMENTS

This work is supported by a French project entitled PLACID(Probabilistic graphical models and Logics for Alarm Corre-lation in Intrusion Detection).

REFERENCES

[1] Fabrizio Angiulli, Stefano Basta, and Clara Pizzuti. Distance-baseddetection and prediction of outliers. IEEE Trans. on Knowl. and DataEng., 18(2):145–160, 2006.

[2] Stefan Axelsson. Intrusion detection systems: A survey and taxonomy.Technical Report 99-15, Chalmers Univ., 2000.

[3] S. Benferhat and K. Tabia. Classification features for detecting server-side and client-side web attacks. In SEC2008 : 23rd InternationalSecurity Conference, Milan, Italy, 2008.

[4] Gregory F. Cooper and Edward Herskovits. A bayesian method forconstructing bayesian belief networks from databases. In Proceedingsof the seventh conference (1991) on Uncertainty in artificial intelligence,pages 86–94, San Francisco, CA, USA, 1991. Morgan KaufmannPublishers Inc.

[5] Gregory F. Cooper and Edward Herskovits. A bayesian method for theinduction of probabilistic networks from data. Mach. Learn., 9(4):309–347, 1992.

[6] Levent Ertz, Eric Eilertson, Aleksandar Lazarevic, Pang-Ning Tan, VipinKumar, Jaideep Srivastava, and Paul Dokas. Minds - minnesota intrusiondetection system.

[7] Sa Li Gerhard Mnz and Georg Carle. Traffic anomaly detection usingk-means clustering., 2007.

[8] Vaibhav Gowadia, Csilla Farkas, and Marco Valtorta. Paid: A proba-bilistic agent-based intrusion detection system. Computers & Security,24(7):529–545, 2005.

[9] Kenneth L. Ingham and Hajime Inoue. Comparing anomaly detectiontechniques for http. In RAID, pages 42–62, 2007.

[10] Javits and Valdes. The NIDES statistical component: Description andjustification. mar 1993.

[11] F. V. Jensen. An Introduction to Bayesian Networks. UCL press, London,1996.

[12] Christopher Kruegel, Darren Mutz, William Robertson, and FredrikValeur. Bayesian event classification for intrusion detection. In ACSAC’03: Proceedings of the 19th Annual Computer Security ApplicationsConference, page 14, Washington, DC, USA, 2003.

[13] Christopher Kruegel and Giovanni Vigna. Anomaly detection of web-based attacks. In CCS ’03: Proceedings of the 10th ACM conferenceon Computer and communications security, pages 251–261, New York,NY, USA, 2003.

[14] Christopher Kruegel, Giovanni Vigna, and William Robertson. A multi-model approach to the detection of web-based attacks. volume 48, pages717–738, New York, NY, USA, 2005.

[15] Christopher Krugel, Thomas Toth, and Engin Kirda. Service specificanomaly detection for network intrusion detection. In SAC ’02: Pro-ceedings of the 2002 ACM symposium on Applied computing, pages201–208, New York, NY, USA, 2002.

[16] Wenke Lee and Dong Xiang. Information-theoretic measures foranomaly detection. In SP ’01: Proceedings of the 2001 IEEE Symposiumon Security and Privacy, page 130, Washington, DC, USA, 2001.

[17] Peter G. Neumann and Phillip A. Porras. Experience with EMERALDto date. In First USENIX Workshop on Intrusion Detection and NetworkMonitoring, pages 73–80, Santa Clara, California, apr 1999.

[18] S.Srinivasan Sathish Alampalayam P. Kumar, Anup Kumar. Statisticalbased intrusion detection framework using six sigma technique.

[19] Snort. Snort: The open source network intrusion detection system.http://www.snort.org. 2002.

[20] Stuart Staniford, James A. Hoagland, and Joseph M. McAlerney. Prac-tical automated detection of stealthy portscans. J. Comput. Secur., 10(1-2):105–136, 2002.

[21] J. Suzuki. A construction of bayesian networks from databases based ona mdl scheme,. In A construction of Bayesian networks from databasesbased on a MDL scheme,, pages 266–273, San Mateo, CA: MorganKaufmann.

[22] Elvis Tombini, Herve Debar, Ludovic Me, and Mireille Ducasse. Aserial combination of anomaly and misuse idses applied to http traffic.In ACSAC ’04: Proceedings of the 20th Annual Computer SecurityApplications Conference, pages 428–437, Washington, DC, USA, 2004.

[23] Alfonso Valdes and Keith Skinner. Adaptive, model-based monitoringfor cyber attack detection. In RAID ’00: Proceedings of the ThirdInternational Workshop on Recent Advances in Intrusion Detection,pages 80–92, London, UK, 2000.

926