real time intrusion prediction based on optimized alerts with … · real time intrusion prediction...

Real Time Intrusion Prediction based onOptimized Alerts with Hidden Markov Model

Alireza Shameli Sendi, Michel Dagenais, Masoume JabbarifarDepartment of Computer and Software Engineering, Ecole Polytechnique de Montreal, Montreal, Canada

Email: {alireza.shameli-sendi, michel.dagenais, masoume.jabbarifar}@polymtl.ca

Mario CoutureDefence Research & Development Canada Valcartier, Quebec, Canada

Email: [email protected]

Abstract— Cyber attacks and malicious activities are rapidlybecoming a major threat to proper secure organization.Many security tools may be installed in distributed systemsand monitor all events in a network. Security managers oftenhave to process huge numbers of alerts per day, produced bysuch tools. Intrusion prediction is an important technique tohelp response systems reacting properly before the networkis compromised. In this paper, we propose a frameworkto predict multi-step attacks before they pose a serioussecurity risk. Hidden Markov Model (HMM) is used toextract the interactions between attackers and networks.Since alerts correlation plays a critical role in prediction,a modulated alert severity through correlation concept isused instead of just individual alerts and their severity.Modulated severity generates prediction alarms for the mostinteresting steps of multi-step attacks and improves theaccuracy. Our experiments on the Lincoln Laboratory 2000data set show that our algorithm perfectly predicts multi-step attacks before they can compromise the network.

Index Terms— Intrusion, Prediction, Response System, Cor-relation, Hidden Markov Model

I. INTRODUCTION

Intrusion detection system (IDS) monitors networkevents for detecting malicious activities or any attempt tobreak into or compromise a system. IDSs often providepoor quality alerts, which are insufficient to support therapid identification of ongoing anomalies or predict thenext goal or step of anomaly [13]. Also, poor quality alertsneedlessly cause the system to be declared unhealthy, pos-sibly triggering high impact prevention responses. Thus,designing an alert optimization component is needed [14].There are two different approaches for alerts correlation:1) Alert Filtering approach: In the first, filtering, theidea is selecting just true alerts from raw alerts that aregenerated by detection components. There are many tech-niques like clustering, classification and frequent-patternmining to implement filtering approach. 2) Alert SeverityModulating approach: In the second approach, the ideais modulating the quality of alerts [2]. The Alert Filteringapproach causes false negatives in prediction but preventsthe application of high impact reactions to the network bythe response component. The Alert Severity Modulatingapproach insures that we have better prediction and abetter security model for the network.

Intrusion Response System (IRS), is the next levelof security technology [3]. Its mission is running goodstrategies to prevent anomaly growth and returning asystem to the healthy mode. It provides security at allsystem levels, such as operating system kernel and net-work data packets [4]. Although many IRSs have beenproposed, designing good strategies for effective responseof anomalies has always been a concern. A trade-offbetween system performance degradation and maximumsecurity is needed [5]. According to the level or degree ofautomation, intrusion response systems can be categorizedas: notification systems, manual response systems andautomated response systems [4,6,9]. Automated responsesystems try to be fully automated using decision-makingprocesses without human intervention. The major problemin this approach is the possibility of executing an improperresponse in case of problem. Automated response systemscan be divided into: 1) Static model: maps an alertto a predefined response. This model is easy to buildbut the major weakness is that the response measuresare predictable. 2) Dynamic model: responses are basedon multiple factors such as system state, attack metrics(frequency, severity, confidence, etc.) and network policy.In other words, the response to an attack may not be thesame depending for instance on the targeted host. Onedrawback of this model is that it does not learn anythingfrom attack to attack, so the intelligence level remainsthe same until the next upgrade. 3) Cost-sensitive: is aninteresting technique that tries to attune intrusion damageand response cost. To measure intrusion damage, a riskassessment component is needed. The big challenge incost-sensitive model is that the risk assessment must beonline and cost factor (risk index) has to be updated overtime [6,7,8,9].

In this context, our contributions include: (1) defininga framework for predicting sophisticated multi-step at-tacks and preventing them by running appropriate setsof responses, using HMM for reducing training time andmemory usage, (2) in contrast to previous models thatuse Alert Filtering approach to correlate alerts, we haveused a novel approach named Alert Severity Modulatingto predict the most interesting steps of multi-step attacks,and (3) our framework can be applied in a real network

JOURNAL OF NETWORKS, VOL. 7, NO. 2, FEBRUARY 2012 311

© 2012 ACADEMY PUBLISHERdoi:10.4304/jnw.7.2.311-321

to predict any kind of DDoS attacksThis paper is organized as follows: first, we will discuss

related work and several existing methods for predictionwill be introduced. The proposed model is illustratedin Section III. In Section IV, experimental results arepresented. Conclusion and future work will be discussedin Section V.

II. RELATED WORK

A number of different approaches that predict multi-step attacks have been proposed. Some researchers placethe prediction algorithm in the detection component. Forexample, Feng et al. [15] believe that existing solutionsare only able to detect after an intrusion has occurred,either partially or fully. Therefore, it is hard to blockattacks in real time. They have proposed a predictionfunction, based on Dynamic Bayesian Networks lookingat system calls, with IDS concepts for predicting the goalsof intruders.

Other researchers have worked on prediction algorithmsbased on detection output. In this method, detectioncomponents are distributed across a network and sendalerts to the prediction component. Of course, there areaggregation and correlation components, between detec-tion and prediction components, to reduce the number offalse IDS alerts.

Yu and Frincke [2] proposed Hidden Colored Petri-Net (HCPN) to predict intruder’s next goal. Previously,researchers used alert correlation to extract true alertsfrom alerts generated by the detection component. This isthe Alert Filtering approach to alert correlation. They havetaken a different approach. Because multi-step attacksactions are unknown but may be partially detected andreported as alerts, the task of alert correlation is not tofind good alerts. All alerts can be useful in prediction.They proposed a method to improve the quality of alertsfor prediction. Our alert optimization component hasthe same features and differs from the Alert Filteringapproach.

Haslum et al [12] proposed a model based on HMMto predict the next step of an anomaly. In this model,distributed system attacks are simulated in four steps.Based on observations from all IDSs in the network, thesystem mode can be moved among states. Thus, eachtime, prediction of the next goal can be estimated by theprobability of each state. However, this model needs tobe tested in a real network.

For modeling the interactions between attackers andnetworks, our technique closely relates to [12]. Theirmodel is based on the output of alert aggregation thatfilters alerts and selects just true alerts from raw alertsgenerated by detection components. Our approach usesthe concepts of modulating the severity of alerts, like[2]. We focus on the severity of alerts and propose anovel algorithm to modulate alert severity by correlationof alerts that are sent by distributed detection components.However, their model does not predict distributed Denialof service (DDoS) attacks while ours can.

Another distinguishing feature that separates our modelfrom previous models is that it can be applied to predictmulti-step attacks performed over a long period, and alertsoptimization helps us to predict DDoS attacks before itmakes a computer resource unavailable to its intendedusers.

III. PROPOSED MODEL

Figure 1 illustrates the basic architecture of the pro-posed model. The following actions would be performedin this architecture:

• Data Gathering: the data gathering componentcaptures network traffic and computer activity andextracts necessary information for the detection com-ponents

• Detection: the detection components try to detectmalicious activities and send alerts to the alertsoptimization component

• Alerts Optimization: alerts optimization modulatesthe severity of alerts through correlation to get betterprediction

• Prediction: the prediction component will attemptto make a prediction of a possible future problembased on the alert observation

• Response: according to the result of the predictioncomponent and problem characteristics, the responsecomponent can prepare an appropriate set of re-sponses to run on the network for preventing theproblem growth and returning the system to thehealthy mode. To obtain the benefits of an automatedresponse system, two major sections are considered:

1) Organization: in the organization section, wetry to select the best set of plans (IP block-ing, TCP Reset, dropping packets, delete files,killing process, run virus check, shutdown, ap-plying patch, change all passwords, ...) [18]based on our strategy (Confidentiality, Integrityand Availability). Our strategy relies on theevaluation of the positive effects of the re-sponses based on their impact on the confi-dentiality, integrity and availability metrics. Wealso take into account the negative impacts onthe other resources in terms of availability. Forexample, after running a response which blocksa specific subnet, a web server under attackis no longer at risk, but the availability of theservice has been decreased.

2) Execution: in the execution section, we have torun our sequence of responses on the networkfor preventing the problem growth and return-ing the system to the healthy mode. Beforeapplying, we need to order the responses basedon positive effect and negative impact.

A. Alerts Optimization

Unfortunately, detection components generate hugenumbers of alerts. Also, in distributed systems, this prob-lem is very complicated. As Figure 2 shows, the first

312 JOURNAL OF NETWORKS, VOL. 7, NO. 2, FEBRUARY 2012

© 2012 ACADEMY PUBLISHER

Figure 1: Architecture of the proposed model.

idea that many researchers have used is selecting truealerts from the raw alerts and then sending these to theprediction component (Alert Filtering approach). It causesmore false negatives in prediction and does not seem toproduce good results in practice. The second idea thatwe have used is Alert Severity Modulating approach thatincreases alerts severity exponentially through correlation.By using correlation concepts among alerts, we havemodulated the alerts severity before sending these to theprediction component.

Figure 2: Comparison of Alert Filtering approach andAlert Severity Modulating approach.

There are many methods to improve the quality ofalerts. In this paper, we focus on severity of alerts andpropose a novel algorithm to modulate it by correlationof alerts that are sent by distributed detection components.Our alerts optimization has two parts:

1) Correlation: Zhu and Ghorbani [13] have proposeda model to extract attack strategies. In this tech-nique, an Alert Correlation Matrix (ACM) is usedto store correlation strengths of any two types ofalerts. In this section, an ACM is defined. This

matrix has the correlation strength between twotypes of alert and is very important in attack pre-diction. Indicating the correlation weights in ACMis difficult and needs knowledge about all alerts,it must be obtained by training process or definedby a security expert. Classification of alerts is use-ful when detection components generate numerousalerts. However, classification reduces precision andcauses more false negatives in prediction. Figure 3shows the ACM. For example, w(1,2) means thatafter the occurrence of alert1, alert2 has w(1,2)

probability of occurring.

Figure 3: Alert Correlation Matrix.

2) Optimization: in this section, a function is usedto increase the severity of alerts. If we use theunmodified severity we get false negatives in pre-diction. Thus, we need a function to increase alertseverity exponentially. This function begins withthe unmodified severity for each alert. We presentFormula 1 to calculate each alert severity.



Alert.severity = Alert.severity ∗ eF∗NK∗A

1.25 ≤ K1 ≤ F ≤ 1001 ≤ A,N

(1)

• N is frequency of alert.• F is alert effect. It is extracted from the ACM.• A is acceptable number of alert per day and

can be calculated based on Acceptable Alertper Day (AAD) matrix.

• K is a constant parameter and can controlprediction occurrence. A large K increases thecorrelation effect. In next subsection, we willsee how the alert severity directly affects theprediction algorithm.

B. Prediction Component

As we know, IDS or detection components usuallygenerate a large number of alerts. Thus, the output ofIDS is a data stream. Stream data is temporally ordered,fast changing, potentially infinite and massive. There isnot enough time to store stream data and rescan thewhole data as static data [19,20,23]. There are sometechniques like clustering, classification and frequent-pattern mining for static data. Using these algorithms instreaming mode presents many challenges. One challengeis scanning static data multiple times, which is impossiblein streaming mode. Also, the big challenge in streamingmode is that one frequent pattern may not be frequent overtime. The Hidden Markov Model (HMM) algorithm is oneof the best ways to tackle this weakness. HMM workswell dealing with streaming inputs. HMM is a statisticalMarkov Model with unobserved state. As another view,HMM is a simplest model of Dynamic Bayesian Network.In HMM, the states are not visible but the output isdependent on the states that are visible. It is fast andcan be useful to assess risk and predict future attacks inintrusion detection systems [21,22].

In the following paragraphs, the elements of HMM aredescribed. An HMM is characterized by the following:

1) States: the system is assumed to be in one of thefollowing states. The states used in this paper aresimilar to the states used in [11]:

• Normal: indicating that system is working welland there is no malicious activity or any attemptto break into the system

• Attempt: indicating that malicious activitiesare attempted against the system

• Progress: indicating that intrusion has beenstarted and is now progressing

• Compromise: indicating that intrusion success-fully compromised the system

We use N, A, P and C to represent them, so Si ={s1 = N, s2 = A, s3 = P, s4 = C}. In Figure 4,the relationship among states is shown.

Figure 4: Hidden Markov Models states for prediction.

2) Observations: Oi = {O1, O2, O3, ..., On} observa-tions are real output from the system being mod-eled. Observations cause the system model to moveamong states. In this case, alerts from detectioncomponents are our observations. We consider theseverity of alerts as observation. Each alert has threepriorities: low, medium and high. However, we donot use the real severity for observations. Afterreceiving the real severity that has three levels, wemap it after alert optimization to the four priorities:low, medium, high, very high. In Figure 2, youcan see our model to map the real severity to theincreased severity using an exponential function.

3) State Transition Probability (Λ): the state transi-tion probability matrix describes the probability ofmoving among states.

4) Observation Transition Probability (Φ): the ob-servation transition probability matrix describes theprobability of moving among observations.

5) Initial State Distribution (Π): it describes theprobability of states when our framework starts.

We will now describe the prediction model in details.As seen in Figure 1, all detection components sendalerts to the alert optimization component. The alertoptimization component increases the alert severity usingan exponential function. The increased severity of alertsis sent to the prediction component as observation. Foreach observation, HMM moves among states and theprobability of being in each state will be updated. Thecomputation needed to update the state distribution isbased on Equation 19 and 27 in [25] and algorithm 1 in[12]. Figure 5 shows the pseudo-code of intrusion predic-tion. First, a new alert severity has to be calculated basedon the alert information with alert severity function. Thus,N, F and A parameters are calculated by three functionsthat are indicated in lines 6, 7 and 8. N is the frequency ofalert that can be calculated by CalculateAlertFrequencyfunction. The Alert correlation matrix (ACM) is usedto calculate the alert effect by the CalculateAlertEffectfunction, as will be explained in the next section. A isthe acceptable number of alerts per day and can be cal-culated based on the CalculateAcceptableAlertFrequencyfunction. Of course, the Acceptable Alert per Day (AAD)matrix must be initialized before running the algorithm.After identifying the alert severity, we will try to up-



Figure 5: Prediction Algorithm.

date the current state distribution. Obs ix indicates theobservation index. For the first observation index, somecalculation is needed, and for the next observation anothercalculation [12,25]. Finally, the compromise state statusis very important for prediction. If it is over 95 percent,it indicates that the distributed system will very likely becompromised in a near future.

IV. EXPERIMENT RESULTS

A. Lincoln Laboratory Scenario (LLDDOS1.0)

The proposed prediction algorithm has been tested us-ing the DARPA 2000 dataset [16]. It consists of two multi-step attack scenarios. We have used the first scenarioto test our model. This data set has a multi-step attackthat tries to install distributed denial of service (DDoS)software in any computer in the target network. Thisattack has 5 steps and takes about three hours. Finally,three computers are compromised. Table I shows the 5steps goal.

We have used the RealSecure IDS to generate analert log file [17]. RealSecure produces 919 alerts byplaying back the ”Inside-tcpdump” of LLDDOS1.0. TableII shows that RealSecure with these alerts can detect thesteps. Unfortunately, the first step can not be detected byRealSecure.

B. Model Parameters

Before starting our framework, we have to initializesome parameters:

• Alert optimization parameters: in this section twomatrices must be initialized: ACM and ADD. As

you see in Table III, RealSecure produces 19 typesof alerts in LLDOS1.0 and we have used thesevalues for the AAD parameter. To initialize ACM,we have used [13]. These correlation weights inACM were obtained during the training process andincrementally updated in this process with a formulathat depends on the number of times that these twotypes of alerts have been directly correlated. Theeffect column shows each alert severity obtained byFormula 2. Alert severity used in this formula isfrom [10] and is shown in Table IV. We have usednormalized columns in our algorithm.

F (Alerti) =

19∑j=1

W(i,j) ∗ Severityj (2)

• HMM parameters: first, at the start of monitoring,Π = {1.0, 0.0, 0.0, 0.0}. It means that the system isin the normal state with 100% probability. Secondly,we have to initialize the state transition probability.Finally, the observation probability matrix has to bespecified.

Λ =

N A P CNAPC

0.9990.001

00

0.0010.9840.001

0

00.0150.9840.001

00

0.0150.999

(3)



TABLE I.: The Five Steps of the DARPA Attack Scenario

Step Name Time Goal

1 IP sweep 9:45 to 09:52 The attacker sends ICMP echo-requests in this sweep and listens for ICMP echo-replies todetermine which hosts are ”up”

2 Sadmind Ping 10:08 to 10:18 The hosts discovered in the previous step are probed to determine which hosts arerunning the ”sadmind” remote administration tool. This tells the attacker which hosts mightbe vulnerable to the exploit that he/she has

3 Break into 10:33 to 10:34 The attacker then tries to break into the hosts found to be running the sadmind service inthe previous step. Breakins via the sadmind vulnerability

4 Installation 10:50 Installation of the trojan mstream DDoS software on three hosts

5 Launch 11:27 Launching the DDoS

TABLE II.: The RealSecure Alerts Related to Each Step

Step Name Alerts

1 IP sweep No alert is generated for this step2 Sadmind Ping Sadmind ping3 Break into Sadmind Amslverify Overflow, Admind4 Installation Rsh, MStream Zombie5 Launch Stream DOS

TABLE III.: Acceptable Alert per Day (AAD) Matrix

ID Alert Name Acceptable Frequency

1 Sadmind Ping 102 TelnetTerminaltype 10003 Email Almail Overflow 104 Email Ehlo 10000005 FTP User 106 FTP Pass 107 FTP Syst 108 HTTP Java 19 HTTP Shells 110 Admind 111 Sadmind Amslverify Overflow 112 Rsh 113 Mstream Zombie 114 HTTP Cisco 115 SSH Detected 1016 Email Debug 117 TelnetXdisplay 318 TelnetEnvAll 1019 Stream DoS 1

Φ =

L M H VHNAPC

0.40.30.20.1

0.30.40.30.2

0.20.20.40.3

0.10.10.10.4

(4)

C. Results

Figure 6 shows the total prediction for the full durationof the Lincoln Laboratory data set with K= 3.5. Asmentioned, our HMM is based on four states (Normal,Attempt, Progress and Compromise). In this diagram, youcan see the four states status simultaneously when theattacker tries to break into the hosts. Normal state showsonline prediction of the network being healthy in a nearfuture. In this diagram we can see when a system ispredicted not healthy in a near future. Our system adjuststhe state with attackers’ progress. When the attacker gets

appropriate results in a multi-step attack, system movesfrom Normal state to the Attempt state and so on. Whenthe probability of Normal state is down, it means theprobability of other states are up.

As we have mentioned, this multi-step attack takesabout three hours and has five steps. You can see theapproximate time periods of all steps conducted by theintruder (based on RealSecure IDS output): 2788-3211sec. [4:46:23-4:53:26] (Step 2), 4377-4400 sec. [5:12:52-5:13:15] (Step 3), 5355 sec. [5:29:10] (Step 4) and 7573sec. [6:06:08] (Step 5). As mentioned in Table I, in thefourth step, attacker installs the trojan mstream DDoSsoftware on three hosts. Eventually, in step 5 the attackerlaunches the DDoS. Thus, our prediction component hasto send an alarm to the response component before step4. Let us see how our prediction algorithm works.

The alert optimization component sends an alert with



TAB

LE

IV.:

Ale

rtC

orre

latio

nM

atri

x

Ale

rtSe

veri

tyL

owL

owM

ediu

mL

owL

owM

ediu

mL

owH

igh

Hig

hH

igh

Hig

hM

ediu

mH

igh

Hig

hL

owH

igh

Low

Low

Med

ium

Ale

rtID

12

34

56

78

910

1112

1314

1516

1718

19E

ffec

tN

orm

aliz

e

10.

30.

010.

010.

010.

010.

010.

010.

010.

019.

38.

166.

272.

370.

010.

010.

010.

690.

690.

0173

.94

0.72

8

21.

7529

.87

19.5

413

9.09

16.1

19.2

916

.11

0.01

0.01

3.79

2.17

2.33

0.6

0.01

3.49

1.29

0.01

0.01

0.01

312.

413.

077

30.

8745

.74

34.8

422

8.07

29.5

225

.27

24.8

612

.25

0.65

1.68

1.12

1.23

0.32

0.98

0.92

1.1

0.01

0.01

0.01

507

4.99

4

41.

7578

2.27

628.

2535

33.9

355

0.71

528.

8552

7.02

13.4

90.

654.

372.

22.

390.

624.

1617

.87

48.3

10.

010.

010.

0179

53.9

778

.35

50.

019.

035.

3229

.57

19.7

127

.31

26.2

10.

010.

010.

010.

010.

010.

010.

010.

271.

480.

010.

010.

0115

4.74

1.52

4

60.

019.

031.

0929

.05

20.7

919

.71

27.3

30.

010.

010.

010.

010.

010.

010.

010.

271.

480.

010.

010.

0113

0.34

1.28

3

70.

018.

761.

0929

.05

21.2

220

.15

19.3

50.

010.

010.

010.

010.

010.

010.

010.

271.

480.

010.

010.

0112

5.82

1.23

9

80.

0111

.82

0.01

29.3

43.

063.

063.

0611

.53

3.88

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

0.01

99.8

80.

983

90.

010.

010.

010.

010.

010.

010.

010.

640.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

012.

260.

022

100.

70.

010.

010.

010.

010.

010.

010.

010.

0130

.79

31.5

372

.87

26.2

50.

010.

010.

014.

114.

120.

0142

0.61

4.14

3

110.

010.

010.

010.

010.

010.

010.

010.

010.

0131

.422

.74

32.1

416

.96

0.01

0.01

0.01

4.11

4.12

0.01

286.

052.

817

120.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

0120

.88

2.79

0.01

0.01

0.01

3.29

3.29

0.01

57.0

10.

561

130.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

019.

970.

010.

010.

010.

010.

010.

013.

250.

032

140.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

350.

010.

010.

010.

010.

011.

050.

01

150.

010.

880.

012.

640.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

017.

660.

075

160.

011.

160.

014.

240.

280.

280.

280.

010.

010.

010.

010.

010.

010.

010.

010.

270.

010.

010.

017.

610.

074

170.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

011.

080.

010.

010.

010.

010.

690.

014.

260.

041

180.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

011.

080.

010.

010.

010.

010.

010.

013.

580.

035

190.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010.

010

0.37

0.00

3



Alert Severity Modulating approach to the prediction com-ponent. Figure 7 illustrates the output of alert optimizationcomponent for the full duration of the Dataset. Thus,prediction component receives optimized alerts and eachstate calculates its probability. At the start of monitoring,the system is in the Normal state with 100% probabilityand other states are zero. The sum of all values at eachtime must be 100%. Our prediction is based on theprobability calculated in the Compromise state. Whenthe probability is over 95%, it means an intrusion isgoing to happen in the near future. The first predictionwas calculated at 4310 seconds, 67 seconds before theattacker does all the work in third step. The secondprediction was calculated at 5323 seconds. It happened 32seconds before the fourth step. The third prediction wascalculated at 6101, 25 minutes before the fifth step. Thus,the administrator can manually apply a set of responsesto mitigate the attack or we can connect the predictioncomponent to an automated intrusion response system todo that automatically.

Also, Table V shows the total number of alerts thatare generated by RealSecure IDS until each prediction.The initial alert severity column illustrates the initial valuerelated to each alert. The optimized alert severity columnshows how Formula 1 works in the alert optimizationcomponent for each type of alert.

As we discussed before, alert optimization modulatedalert severity over time with Formula 1. There is aconstant parameter (K) in this formula for which we canevaluate the effect on the prediction algorithm, as illus-trated in Figure 8. In fact, K is the prediction controller.As shown in Figure 8, there are a few predictions closelyspaced in time. Because of this close spacing, they areconsidered as a region. As seen in Figure 8, when K=2.5 there are two regions and with K= 3.5, there are threeregions within a few minutes. In Figure 8, we can see thatwith K= 3.5, a prediction happens before each of step 3,4 and 5 and the result is most interesting. For K= 2.5,two predictions happen, the first related to the third stepand the second related to the fourth step.

If K is big, the prediction component is more sensitiveand sends more alarms to the response component. Inthis case, the response component can apply responsesmore frequently. It means, there are more chances to repelattack if we could not stop the progress of attack. We maystill want to set K to a higher value to avoid missing anattack and to have more time to evaluate the risk indexand select more appropriately the level of response.

In the prediction view, we have two types of intrusionresponse systems: Reactive and Proactive [4,24]. In theReactive approach, all responses are delayed until theintrusion is detected. Since the reactive responses areapplied when an incident is detected, the system is in anunhealthy state from before the detection of the maliciousactivity until the application of the reactive responses.Sometimes, it is difficult to return the system to thehealthy state. This type of IRS is not useful for high secu-rity. For example, suppose the attacker was successful in

accessing a database, illegally reading critical informationand after that the IDS sends an alarm about a detectedmalicious activity. In this case, a reactive response is notuseful because the critical information has been disclosed.In summary, we have designed a Proactive IRS that canpredict different kinds of DDoS attacks often minutesbefore it happens.

V. CONCLUSION

In this paper, we presented an architecture to predictintrusions and trigger good response strategies. A novelalert correlation is used to decrease false negatives inprediction. Our experimental results on the DARPA 2000data set have shown that our model can perfectly predictdistributed denial of service attacks and has a poten-tial to detect multi-step attacks missed by the detectioncomponent. Several future research directions are worthinvestigating to improve our model. First, we would liketo study how to update the ACM based on predictionanalysis results. For example, the correlation strengthbetween two types of alerts can be updated by receivinghints from the prediction component. However, the ACMshould not be updated every time because the attackercould run impractical actions in the first step of an attack,increase the correlation strength between two or morealerts and consequently cause incorrect predictions.

Secondly, we want to add a risk assessment componentin our model. Risk assessment is the process of identifyingand characterizing risk. The result of risk assessment isvery important to minimize the impact on system healthwhen anomaly has been detected. Finally, we plan tointerface our system to live data center network data.

ACKNOWLEDGMENT

The support of the Natural Sciences and EngineeringResearch Council of Canada (NSERC), Ericsson SoftwareResearch and Defence Research and Development Canada(DRDC) is gratefully acknowledged.

REFERENCES

[1] F. Xiao, S. Jin, and X. Li, ”A Novel Data Mining-BasedMethod for Alert Reduction and Analysis,” Journal ofNetwork, vol. 5, 2010, pp. 88-97.

[2] D. Yu, and D. A. Frincke, ”Improving the quality of alertsand predicting intruder’s next goal with Hidden ColoredPetri-Net,” Computer Networks, vol. 51, 2007, pp. 632-654.

[3] K. Scarfone, and P. Mell, ”Guide to Intrusion Detectionand Prevention Systems,” Technical Report NIST SP 800-94, National Institute of Standards and Technology, 2007.

[4] N. Stakhanova, S. Basu, and J. Wong, ”Taxonomy ofIntrusion Response Systems,” Journal of Information andComputer Security, vol. 1, 2007, pp. 169-184.

[5] D. B. Payne, and H. G. Gunhold, ”Policy-based securityconfiguration management application to intrusion detectionand prevention,” 2009 IEEE International Conference onCommunications, Dresden, Germany, 2009.

[6] A. Curtis, and J. Carver, ”Adaptive agent-based intrusionresponse,” Ph.D thesis, Texas A&M University, USA, 2001.

[7] W. Lee, W. Fan, and M. Miller, ”Toward cost-sensitivemodeling for intrusion detection and response,” Journal ofComputer Security, vol. 10, 2002, pp. 5-22.



TAB

LE

V.:

Tota

lpr

edic

tion

resu

ltan

dou

tput

ofal

ert

optim

izat

ion

for

the

all

pred

ictio

nsin

DA

RPA

data

set

with

K=

3.5

Firs

tPr

edic

tion

Reg

ion

(431

0se

c.)

Seco

ndPr

edic

tion

Reg

ion

(532

3se

c.)

Thi

rdPr

edic

tion

Reg

ion

(610

1se

c.)

IDA

lert

Nam

eIn

itial

Ale

rtSe

veri

tyTo

tal

repe

titio

nin

Dat

aSe

tO

ptim

ized

Ale

rtSe

veri

tyTo

tal

repe

titio

nO

ptim

ized

Ale

rtSe

veri

tyTo

tal

repe

titio

nO

ptim

ized

Ale

rtSe

veri

tyTo

tal

repe

titio

n

1Sa

dmin

dPi

ngL

ow(1

)3

Low

(1)

3-

0-

0

2Te

lnet

Term

inal

type

Low

(1)

126

Hig

h(3

)55

Ver

yH

igh

(4)

8V

ery

Hig

h(4

)2

3E

mai

lA

lmai

lO

verfl

owM

ediu

m(2

)38

Ver

yH

igh

(4)

9V

ery

Hig

h(4

)8

-0

4E

mai

lE

hlo

Low

(1)

522

Low

(1)

195

Med

ium

(2)

77M

ediu

m(2

)36

5FT

PU

ser

Low

(1)

49V

ery

Hig

h(4

)14

Ver

yH

igh

(4)

1V

ery

Hig

h(4

)4

6FT

PPa

ssM

ediu

m(2

)49

Ver

yH

igh

(4)

14V

ery

Hig

h(4

)1

Ver

yH

igh

(4)

4

7FT

PSy

stL

ow(1

)44

Ver

yH

igh

(4)

11V

ery

Hig

h(4

)1

Ver

yH

igh

(4)

4

8H

TT

PJa

vaH

igh

(3)

8V

ery

Hig

h(4

)8

-0

-0

9H

TT

PSh

ells

Hig

h(3

)15

Ver

yH

igh

(4)

15-

0-

0

10A

dmin

dH

igh

(3)

17V

ery

Hig

h(4

)9

Ver

yH

igh

(4)

8-

0

11Sa

dmin

dA

msl

veri

fyO

verfl

owH

igh

(3)

14V

ery

Hig

h(4

)6

Ver

yH

igh

(4)

8-

0

12R

shM

ediu

m(2

)17

-0

Ver

yH

igh

(4)

17-

0

13M

stre

amZ

ombi

eH

igh

(3)

6-

0H

igh

(3)

1H

igh

(3)

2

14H

TT

PC

isco

Hig

h(3

)2

-0

-0

Hig

h(3

)2

15SS

HD

etec

ted

Low

(1)

4-

0-

0-

0

16E

mai

lD

ebug

Hig

h(3

)2

-0

-0

-0

17Te

lnet

Xdi

spla

yL

ow(1

)1

-0

-0

-0

18Te

lnet

Env

All

Low

(1)

1-

0-

0-

0

19St

ream

DoS

Med

ium

(2)

1-

0-

0-

0To

tal

919

320

124

56



Figure 6: Total prediction result and HMM states status for DARPA data set with K= 3.5.

Figure 7: The output of alert optimization component for the full duration of the Dataset with K= 3.5.

[8] D. B. Payne, and H. G. Gunhold, ”Evaluating the Impactof Automated Intrusion Response Mechanisms,” Proceed-ings of the 18th Annual Computer Security ApplicationsConference, Los Alamitos, USA, 2002.

[9] C. P. Mu, and Y. Li, ”An intrusion response decision-makingmodel based on hierarchical task network planning,” Expertsystems with applications, vol. 37, 2010, pp. 2465-2472.

[10] RealSecure Signatures Reference Guide. Internet SecuritySystems, http://documents.iss.net/literature/RealSecure/RS Signatures 6.0.pdf.

[11] K. Haslum, A. Abraham, and S. Knapskog, ”Dips: Aframework for distributed intrusion prediction and preven-tion using hidden markov models and online fuzzy riskassessment,” In Third International Symposium on Infor-

mation Assurance and Security, 2007, pp. 183-188.[12] K. Haslum, M. E. G. Moe, and S. J. Knapskog, ”Real-

time intrusion prevention and security analysis of networksusing HMMs,” 33rd IEEE Conference on Local ComputerNetworks, Montreal, Canada, 2008.

[13] B. Zhu, and A. A. Ghorbani, ”Alert correlation for ex-tracting attack strategies,” International Journal of NetworkSecurity, vol. 3, 2006, pp. 244-258.

[14] C. Kruegel, F. Valeur, and G. Vigna, ”Alert Correlation,”in Intrusion Detection and Correlation, first edition, vol. 14,Ed. New York: Springer, 2005, pp. 29-35.

[15] L. Feng, W. Wang, L. Zhu, and Y. Zhang, ”Predicting in-trusion goal using dynamic Bayesian network with transferprobability estimation,” Journal of Networks and Computer



Figure 8: Compromised state output for DARPA data set for three different values of K (with K= 2.5 and K= 3.5).

Applications, vol. 32, n. 3, 2009, pp. 721-732.[16] MIT Lincoln Laboratory, 2000 darpa intrusion detection

scenario specific data sets, 2000.[17] North Carolina State University Cyber Defense Lab-

oratory, Tiaa: A toolkit for intrusion alert analysis,http://discovery.csc.ncsu.edu/software/correlator/ver0.4/index.html.

[18] S. Tanachaiwiwat, K. Hwang, and Y. Chen, ”Adaptive In-trusion Response to Minimize Risk over Multiple NetworkAttacks,” ACM Trans on Information and System Security,2002.

[19] J. Han, and M. Kamber, ”Data Mining: Concepts andTechniques,” 2nd ed., San Francisco: Elsevier, 2006.

[20] M. Gaber, A. Zaslavsky, and S. Krishnaswamy, ”MiningData Streams: A Review,” ACM SIGMOD Record, vol. 34,2005.

[21] J. Han, H. Cheng, D. Xin, and X. Yan, ”Frequent patternmining: Current status and future directions,” Data Miningand Knowledge Discovery, 2007.

[22] W. Li, and Z. Guo, ”Hidden Markov Model Based RealTime Network Security Quantification Method,” nswctc,International Conference on Networks Security, WirelessCommunications and Trusted Computing, pp. 94-100, 2009.

[23] C. Aggarwal, J. Han, J. Wang, and P. Yu, ”A Frame-work for Projected Clustering of High Dimensional DataStreams,” Proceedings of the 30th VLDB Conference,Toronto, Canada, 2004.

[24] N. B. Anuar, M. Papadaki, S. Furnell, and N. Clarke, ”Aninvestigation and survey of response options for intrusionresponse systems,” Information Security for South Africa,pp. 1-8, 2010.

[25] L. R. Rabiner, ”A tutorial on hidden markov models andselected applications in speech recognition,” Readings inspeech recognition, pp. 267-296, 1990.



real time intrusion prediction based on optimized alerts with … · real time intrusion prediction...

Documents