network intrusion detection systems using random …rozup.ir/download/2420063/pa_4.pdf ·...

NETWORK INTRUSION DETECTION SYSTEMS

USING RANDOM FORESTS ALGORITHM

by

J io n g Z h a n g

A thesis subm itted to the

School of Computing

in conformity with the requirements for

the degree of M aster of Science

Queen’s University

Kingston, Ontario, Canada

December 2005

Copyright © Jiong Zhang, 2005

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

Library and Archives Canada

Bibliotheque et Archives Canada

Published Heritage Branch

395 Wellington Street Ottawa ON K1A 0N4 Canada

Your file Votre reference ISBN: 978-0-494-15337-6 Our file Notre reference ISBN: 978-0-494-15337-6

Direction du Patrimoine de I'edition

395, rue Wellington Ottawa ON K1A 0N4 Canada

NOTICE:The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats.

AVIS:L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par I'lnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats.

The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these.Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation.

In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis.

While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.

Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these.

Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant.

i * i

CanadaReproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A bstract

W ith the trem endous growth of network-based services and sensitive information

on networks, the number and the severity of network-based computer attacks have

significantly increased. Completely preventing breaches of security is unrealistic by

security technologies. Therefore, intrusion detection is an im portant component in

network security. However, many current intrusion detection systems are rule-based

systems, which have lim itations to detect novel intrusions. Moreover, encoding rules

is time-consuming and highly depends on the system builder’s knowledge of a deep

understanding of known intrusions.

In this thesis, we propose new system atic frameworks th a t apply a da ta mining al

gorithm called random forests in misuse, anomaly, and hybrid network-based intrusion

detection systems. In misuse detection, patterns of intrusions are built autom atically

by random forests over training data. After th a t, intrusions are detected by matching

network activities against the patterns. In anomaly detection, novel intrusions are

detected by outlier detection of random forests. After building the patterns of net

work services by random forests, outliers related to the patterns are determined by

the outlier detection algorithm. The hybrid detection system improves the detection

performance by combining the misuse and anomaly detection. The misuse detection

can detect known intrusions with high detection rate and low false positive rate. The

i


anomaly detection can detect some unknown intrusions. The hybrid system combines

the advantages of both the techniques.

We evaluate our approaches over the K D D’99 dataset. The experimental results

show th a t the performance provided by our misuse approach is be tter than the best

KDD’99 result. The results also indicate th a t our approach in anomaly detection

achieves higher detection rate when the false positive rate is low, compared to other

reported unsupervised anomaly detection approaches. The evaluation dem onstrates

th a t the hybrid system can improve the overall performance of the above mentioned

intrusion detection systems.

ii


Acknowledgm ents

I would like to express my gratitude to all those who gave me the possibility to

complete this thesis.

Especially, I would like to thank my supervisor, Dr. M ohammad Zulkernine,

for his supervision and encouragement in all the tim e of my research. W ithout his

guidance and help, the work presented in this thesis could not have been possible. I

have learned a lot from him and have been highly impressed w ith his hard work and

dedication.

I would like to thank Dr. David Skillicorn for his helpful suggestions on my

research and comments on my paper.

I want to thank my family, especially my wife, for their support and encourage

ment.

I want to thank all members of the Queen’s Reliable Software Technology (QRST)

research group for the great tim e working together.

I also want to thank the faculty, staff, and my classmates in the School of Com

puting for their help.

This research has been supported and funded by Bell Canada through Bell Univer

sity Laboratories (BUL), and M athem atics of Information Technology and Complex

Systems (MATICS).

iii


Contents

A bstract i

A cknowledgm ents iii

C ontents iv

List o f Tables vii

List o f Figures viii

1 Introduction 11.1 M o tiv a tio n ............................................................................................................. 11.2 O verview ................................................................................................................ 31.3 Summary of c o n tr ib u tio n s .............................................................................. 61.4 Thesis o rg a n iz a tio n ............................................................................................ 7

2 Background and related work 92.1 Random f o r e s t s ................................................................................................... 92.2 Intrusion d e te c t io n ............................................................................................ 11

2.2.1 Misuse detection ................................................................................... 122.2.2 Anomaly d e te c t io n ................................................................................ 132.2.3 Hybrid d e t e c t i o n ................................................................................... 14

2.3 D ata mining based d e te c tio n ........................................................................... 152.3.1 A D A M ....................................................................................................... 152.3.2 MADAM I D ............................................................................................. 192.3.3 J A M ........................................................................................................... 19

2.4 D a ta s e t s ................................................................................................................ 202.4.1 DARPA d a ta s e t ....................................................................................... 202 .4 .2 K D D ’99 d a t a s e t ....................................................................................................... 22

iv


3 M isuse detection 253.1 Mining patterns of intrusions ........................................................................ 25

3.1.1 Overview of the fram ew ork.................................................................. 253.1.2 Optim ization for random fo re s ts ........................................................ 273.1.3 Imbalanced intrusions ......................................................................... 283.1.4 Feature s e le c t i o n ................................................................................... 29

3.2 Experim ents and r e s u l t s ................................................................................... 303.2.1 D ataset and p rep ro cess in g .................................................................. 303.2.2 Performance comparison on balanced and imbalanced dataset 313.2.3 Selection of im portant features ........................................................ 323.2.4 Param eter optim ization for random f o re s ts ..................................... 343.2.5 D istribution of error r a t e s .................................................................. 353.2.6 Speed performance of d e te c t io n ........................................................ 373.2.7 Evaluation and d is c u s s io n .................................................................. 393.2.8 Im p le m e n ta tio n ....................................................................................... 41

3.3 Summary .............................................................................................................. 41

4 A nom aly detection 434.1 Detecting o u t l ie r s ................................................................................................ 43

4.1.1 Overview of the fram ew ork ................................................................. 444.1.2 Mining patterns of network se rv ic e s ................................................ 444.1.3 Unsupervised outlier d e te c tio n ........................................................... 45

4.2 Experim ents and results . . . ....................................................................... 474.2.1 D ataset and p rep ro cess in g .................................................................. 474.2.2 Evaluation and d is c u s s io n .................................................................. 484.2.3 Experim ents on the detection performance over different datasets 534.2.4 Experim ent on the detection performance over m inority intrusions 564.2.5 Im p le m e n ta tio n ....................................................................................... 58

4.3 Summary .............................................................................................................. 58

5 C om bination o f m isuse and anom aly detection 605.1 Misuse detection versus anomaly d e te c t io n ................................................. 605.2 Approaches to combine misuse and anomaly d e te c tio n ............................ 615.3 Architecture of the hybrid s y s te m .................................................................. 655.4 Experim ents and r e s u l t s ................................................................................... 68

5.4.1 D ataset and p rep ro cess in g .................................................................. 685.4.2 Evaluation and d is c u s s io n .................................................................. 685.4.3 Im p le m e n ta tio n ....................................................................................... 73

5.5 Summary .............................................................................................................. 74

v


6 Conclusion and future work 756.1 C onclusion ............................................................................................................. 756.2 Lim itations and future w o r k ............................................................................. 77

Bibliography 80

vi


List of Tables

2.1 Intrusions in the 1998 DARPA d a t a s e t ........................................... 212.2 The features in the KDD’99 d a ta s e t .................................................. 22

3.1 Numbering of the attack c a te g o r ie s .................................................. 313.2 Performance on the balanced dataset compared to the original dataset 323.3 Cost m a t r i x .............................................................................................. 403.4 Performance comparison on the KDD’99 d a ta s e t .......................... 40

4.1 The oob error rates for param eter optim ization in the anomaly detection experiments 48

4.2 The performance of each algorithm over the KDD’99 d a t a s e t .. 534.3 The optim al param eters of random f o r e s t s .................................... 54

5.1 The oob error rates for param eter optim ization in the hybrid approachexperim en t.................................................................................................. 70

vii


List of Figures

2.1 An example of a decision t r e e ............................................................. 102.2 The training phase of ADAM ........................................................................ 172.3 Discovering intrusions with A D A M ................................................... 18

3.1 Architecture of the misuse based N I D S ............................................ 263.2 Variable im portance of the features in the misuse approach experiment 343.3 Performance w ith different values for param eter M try of random forests 353.4 D istribution of the oob error r a t e ....................................................... 363.5 Average oob error rate for different M t r y ......................................... 373.6 Speed measurement of d e te c tio n .......................................................... 39

4.1 The framework of the unsupervised anomaly NIDS ....................... 444.2 The outlier-ness of the 1% attack d a ta s e t .................................................... 514.3 The ROC curve for the 1% attack dataset ................................................ 524.4 The outlier-ness of the 2% attack d a ta s e t .................................................... 544.5 The outlier-ness of the 5% attack d a ta s e t .................................................... 554.6 The outlier-ness of the 10% attack d a t a s e t ................................................ 554.7 The ROC curves for the different datasets ................................................ 564.8 The outlier-ness of the minority attack d a ta s e t ............................... 574.9 The ROC curve for the minority attack dataset ...................................... 58

5.1 Framework of anomaly detection followed by misuse detection . . . . 625.2 Framework of the parallel a p p ro a c h ................................................... 635.3 Framework of misuse detection followed by anomaly detection . . . . 645.4 Architecture of the hybrid s y s te m ....................................................... 665.5 Variable im portance of the features in the hybrid approach experiment 695.6 Outlier-ness of the anomaly test s e t ................................................... 73

viii


Chapter 1

Introduction

1.1 M otivation

Com puter networks provide people with news, email, online shopping and online

banking. More and more sensitive information such as credit card details, per

sonal information are stored on computer networks. W ith the trem endous growth

of network-based services and sensitive information on networks, network security is

getting more im portant than ever. Although a wide range of security technologies

such as information encryption, access control, and intrusion prevention are used to

protect network-based systems, there are still many undetected intrusions. For ex

ample, firewalls cannot prevent internal attacks. According to the report of C SI/FB I

computer and security survey, to ta l losses for 2004 were $141,496,560 [1], Moreover,

most of the losses caused by intrusions are not reported. Intrusion Detection Systems

(IDSs) can detect intrusions autom atically by monitoring activities of networks or

systems, instead of analyzing activities by security experts. Thus, intrusion detection

systems play a vital role in network security.

1


CHAPTER 1. INTRODUCTION 2

Currently, many NIDSs (Network Intrusion Detection Systems) such as Snort [4]

are rule-based systems, which employ misuse detection techniques and have limited

extensibility for novel attacks. Their performances highly rely on the rules identified

by security experts. In rule-based systems, security experts analyze traffic da ta and

develop rules to specify intrusions. However, the amount of network traffic is huge,

and it is very difficult to specify some intrusions using rules. Therefore, the process

of encoding rules is expensive and slow. Another problem of rule-based systems

is high maintenance cost. Security people have to modify the rules or deploy new

rules manually using a specific rule-driven language. If the rules are deployed in

different kinds of systems, different rule-driven languages are needed. To overcome the

lim itations of rule-based systems, a number of IDSs employ da ta mining techniques.

D ata mining is the analysis of (often large) observational d a ta sets to find patterns or

models th a t are both understandable and useful to the data owner [18]. D ata mining

can efficiently extract patterns of intrusions for misuse detection, establish profiles

of normal network activities for anomaly detection, and build classifiers to detect

attacks, especially for the vast am ount of audit data. D ata mining-based systems

are more flexible and deployable. The security experts only need to label the audit

d a ta to indicate intrusions instead of hand-coding rules for intrusions. Over the past

several years, a growing number of research projects have applied da ta mining to

intrusion detection with different algorithms [19, 8, 6]. For instance, MADAM ID

[19] and ADAM [8] employ an association rules algorithm.

There are two m ajor intrusion detection techniques: misuse detection and anomaly

detection. Misuse detection discovers attacks based on patterns extracted from known

intrusions [9]. Anomaly detection identifies attacks based on significant deviations



from the established profiles of normal activities [16]. Misuse detection has low false

positive rate, bu t cannot detect novel attacks. Anomaly detection can detect unknown

attacks, bu t usually has high false positive rate. To combine the advantages of both

misuse detection and anomaly detection, many hybrid approaches have been proposed

[8, 33, 7]. The m ajor challenge of a hybrid system is to build a framework th a t can

effectively incorporate bo th anomaly and misuse detection.

1.2 O verview

To address the problems associated with the existing approaches in network intrusion

detection, this thesis proposes new system atic frameworks th a t apply the random

forests algorithm in misuse detection, anomaly detection, and hybrid detection (com

bination of misuse and anomaly detection).

The random forests algorithm is an ensemble classification and regression ap

proach, which is unsurpassable in accuracy among current da ta mining algorithms

[12], The random forests algorithm has been used extensively in different applica

tions. For instance, it has been applied to prediction [17, 28], probability estim ation

[35], and pa tte rn analysis in m ultim edia information retrieval and bioinformatics [36].

However, to the best of our knowledge, the random forests algorithm has not been

applied in autom atic intrusion detection.

Accuracy is critical to develop effective NIDSs, since high false positive rate or low

detection rate will make NIDSs unusable. To improve detection performance, we also

propose m ethods to address the issues of imbalanced intrusions and feature selection

in the mining process as discussed below.

One of the challenges in intrusion detection systems is feature selection. Many



algorithms are sensitive to the number of features. Hence, feature selection is essential

for improving detection rate. Moreover, the raw d a ta of network traffic is usually

audited in tcpdum p format, and the tcpdum p form at is not suitable for detection.

IDSs m ust construct features from the raw data. The process of feature construction

from tcpdum p form at d a ta involves a lot of com putation. Thus, feature selection can

help reducing the com putational cost for feature construction by reducing the number

of features. However, in many current d a ta mining-based IDSs, feature selection is

based on domain knowledge or intuition. We use the feature selection algorithm of the

random forests algorithm, because the algorithm can give estim ates of w hat features

are im portant in the classification.

Another challenge of intrusion detection is imbalanced intrusion. Some intrusions

such as Denial of Service (DoS) [25] have much more connections th an others (e.g.,

User to Root). Most of the d a ta mining algorithms try to minimize the overall error

rate, bu t this leads to increasing the error rate of minority intrusions. However, in

real world network environments, the minority attacks are more dangerous than the

m ajority attacks. Therefore, we need to improve the detection performance for the

minority intrusions.

Anomaly detection is a critical issue in Network Intrusion Detection Systems

(NIDSs). Many NIDSs employ misuse detection techniques, which have limited exten

sibility for novel attacks. To detect novel attacks, many anomaly detection systems

are developed. Most of them are based on supervised approaches [8, 26, 34], For in

stance, ADAM [8] employs association rules algorithm in intrusion detection. ADAM

builds a profile of normal activities over attack-free training data, and then detects



attacks with the previously built profile. The problem of ADAM is the high depen

dency on training d a ta for normal activities. However, attack-free training da ta is

difficult to come by, since there is no guarantee th a t we can prevent all attacks in

real world networks. Actually, one of the m ost popular ways to undermine anomaly

based IDSs is to incorporate some intrusive activities into the training da ta [32]. The

IDSs trained by the data w ith intrusions will lose the ability to detect these kinds of

intrusions. Another problem of the supervised anomaly based IDS is high false posi

tive rate when network environment or services are changed. Since training d a ta only

contains historical activities, profile of normal activities can only include historical

patterns of normal behavior. Therefore, new activities due to changing of network

environment or services will deviate from the previously built profile and are detected

as attacks. T ha t will increase the number of false positives.

To overcome the lim itations of supervised anomaly based systems, a number of

IDSs employ unsupervised approaches [16, 31, 21]. Unsupervised anomaly detection

does not need attack-free training data. It detects attacks by determining unusual

activities from da ta under two assumptions [21]:

• The m ajority of activities are normal.

• A ttacks statistically deviate from normal activities.

The unusual activities are outliers which are inconsistent with the remainder of data

set [10]. Thus, outlier detection techniques can be applied in unsupervised anomaly

detection. Actually, outlier detection has been used in a number of practical appli

cations such as credit card fraud detection, voting irregularity analysis, and severe

weather prediction [23].



We propose an approach to use outlier detection technique provided by the random

forests algorithm in anomaly intrusion detection. The main challenge of anomaly

intrusion detection is to reduce false positives. The outlier detection technique is

effective to reduce false positive rate w ith a desirable detection rate.

In hybrid detection, we propose a framework to combine misuse and anomaly

detection. Therefore, the hybrid system not only achieves high performance provided

by the misuse detection, bu t also can detect novel intrusions.

1.3 Sum m ary of contributions

In this thesis, we apply the random forests algorithm in network intrusion detection.

We present the approaches to employ and optimize the random forests algorithm in

misuse detection, anomaly detection, and hybrid detection. The m ajor contributions

of the thesis are listed as follows:

• Propose new systematic frameworks that employ the random forests algorithm

fo r network intrusion detection. To the best of our knowledge, the random

forests algorithm has not been applied in NIDSs, especially for anomaly detec

tion systems [37, 39, 38].

• Apply sampling techniques and feature selection algorithm in misuse detection

to improve the performance o f the NIDS. The sampling techniques increase the

detection ra te of m inority intrusions. The feature selection technique improves

the overall detection performance [37].

• Employ a new service-based unsupervised outlier detection approach in anomaly

NIDS. The outlier function provided by the random forests algorithm is used



in anomaly detection. By building patterns of network services, the algorithm

determines outliers related to the built patterns. The proposed approach does

not need attack-free training data which is difficult to obtain in real world

network environments [39].

• Combine misuse detection and anomaly detection. Misuse detection has high

detection rate w ith low false positive rate. However, misuse detection cannot

detect novel intrusions. Anomaly detection can detect novel intrusions. There

fore, the combination of the misuse and anomaly detection improves the overall

performance of NIDSs [38].

1.4 T hesis organization

The thesis is organized as follows. In Chapter 2, we introduce intrusion detection,

random forest algorithm, and datasets used in intrusion detection. We also discuss

the related work, especially da ta mining-based detection systems.

In Chapter 3, we describe in detail the misuse detection using the random forests

algorithm. We explain the approaches to improve detection performance of the misuse

detection system. We also show the experimental results in the chapter.

In Chapter 4, we discuss the framework of the anomaly detection and show how

to apply the random forests algorithm in unsupervised anomaly detection. The per

formance evaluations are also presented.

In Chapter 5, we propose a framework to combine the misuse and anomaly detec

t io n . T h e a r c h i te c tu r e of th e proposed hybrid system is explained in detail. We also

evaluate the hybrid system.



Finally, we summarize our work and outline our future research plans in Chapter

6. We also discuss the lim itations of the presented approaches.


Chapter 2

Background and related work

2.1 R andom forests

A decision tree has a root node connected by successive links to other nodes [13].

These nodes are similarly connected until reaching leaf nodes th a t have no further

connected nodes. An example of a decision tree is shown in Figure 2.1 on the next

page.

Random forests [12] is an ensemble of un-pruned classification or regression trees.

The random forest algorithm generates many classification trees. Each tree is con

structed by a different bootstrap sample from the original da ta using a tree classifi

cation algorithm as the following steps:

1. If the number of training da ta is A, the algorithm sample N cases a t random

with replacement from the original data. The chosen cases will be used to

construct the tree.

2. If there are M features in the training set, the algorithm chooses m features

9


CHAPTER 2. BACKGROUND AND RELATED WORK 10

from them at random at each node. The value of m is held constant by setting a

param eter of the algorithm. At a node, the algorithm uses each chosen feature

to split the node. After th a t, the best feature is selected to split this node in

the tree. The best feature makes the cases reaching the imm ediate descendent

nodes as pure as possible. The process is repeated recursively for each node of

the tree.

W; ( I t k ! t s m e d iu m ? rotund '

i-r-.I .y liOlNWi.\pp to «;rajs*

UrapvrnHl VNerty

Figure 2.1: An example of a decision tree [13]

After the forest is formed, a new object th a t needs to be classified is pu t down

each of the tree in the forest for classification. Each tree gives a vote th a t indicates

the tree’s decision about the class of the object. The forest chooses the class with the

most votes for the object.

The m ain features of the random forests algorithm are listed as follows [12]:

• It is one of unsurpassable in accuracy among the current d a ta mining algorithms.



• It runs efficiently on large da ta sets w ith many features. It is suited for network

intrusion detection. The volume of network traffic is huge and network activities

are complex. Thus, datasets of network traffic are large and have many features.

• It can give estim ates of w hat features are im portant.

• It has no nominal d a ta problem and does not over-fit.

• It can handle unbalanced datasets.

• It provides an effective approach to estim ate missing d a ta and m aintains accu

racy when a large proportion of d a ta are missing.

• It can detect outliers using proximities between pairs of cases.

In the random forests algorithm, there is no need for cross-validation or a test set to

get an unbiased estim ate of the test error. Since each tree is constructed using the

bootstrap sample, approxim ately one-third of the cases are left out of the bootstrap

samples and not used in training. These cases are called out of bag (oob) cases. These

oob cases are used to get a run-tim e unbiased estim ate of the classification error as

trees are added to the forest.

2.2 Intrusion detection

An Intrusion Detection System (IDS) detects attacks by observing activities on a

variety of system and network sources [15]. There are two main types of intrusion

detection systems: host-based ID S a n d n e tw o rk -b a s e d ID S [8, 6]. N e tw o rk I n tru s io n

Detection Systems (NIDSs) detect attacks by observing various network activities,



while Host-based Intrusion Detection Systems (HIDSs) detect intrusions in individual

hosts. An NIDS examines the output of a packet sniffer or network switch. A sniffer is

a program th a t reads raw packets off a local network segment. An NIDS can monitor

more targets on a network, and can detect some attacks th a t HIDSs miss. HIDSs do

not see packet headers, so they cannot detect some types of attacks. For example,

many IP-based denial of service (DoS) attacks can only be detected by NIDSs, since

NIDSs can look at the packet headers as they travel across networks. Besides, NIDSs

do not rely on host operating systems as detection sources, bu t HIDSs require specific

operating systems to function properly. Some hybrid IDSs use both host-based and

network-based systems to detect intrusions [19].

The techniques used in intrusion detection can also be divided into two m ajor

approaches: misuse detection and anomaly detection [8]. The following subsections

briefly explain the two approaches.

2.2 .1 M isu se d e tec tio n

Misuse detection identifies intrusions by searching for known patterns of attacks. The

current commercial NIDSs employ this strategy. A disadvantage of misuse detection

is th a t it cannot detect unknown attacks. Different techniques have been used for

misuse detection, such as expert systems, signature analysis, state-transition analysis

and d a ta mining.

The expert system uses a set of rules to describe intrusions [9]. Audit events

are translated into facts th a t carry their semantic significance in the expert system.

Then, an inference engine can draw conclusions using these rules and facts.

State transition analysis expresses attacks with a set of goals and transitions based



on sta te transition diagrams [9]. Any event th a t triggers an intrusion state will be

detected as an intrusion.

Signature analysis describes attacks using signatures th a t can be found in audit

trail [9], Any activity th a t m atches the signatures will be determ ined as an attack.

In recent years, many d a ta mining-based research work have been proposed for

intrusion detection [9]. D ata mining is an effective way to extract useful and previ

ously unnoticed models or patterns from large data sources. The models or patterns

can be represented in some forms, such as rules, decision trees, instance-based exam

ples, and neural nets. Many da ta mining algorithms have been employed in misuse

detection. For example, Association rules algorithm is used by MADAM ID (Min

ing Audit D ata for A utom ated Models for Intrusion Detection) [19], ADAM (Audit

D ata Analysis and Mining) [8], and IDDM (Intrusion Detection Using D ata Mining

Techniques) [6]. Decision tree and fuzzy association rules are employed in intrusion

detection [30, 24]. The neural network algorithm is used to improve the performance

of IDS [22],

2.2 .2 A n om aly d e tec tio n

Since misuse detection cannot detect unknown attacks, anomaly detection is used to

address this shortcoming. Various anomaly detection approaches have been proposed

and implemented.

Unsupervised anomaly detection in NIDSs, as discussed below, is a new research

area [21]. Eskin et al. [16] investigated three algorithms in unsupervised anomaly

d e te c tio n : c lu s te r -b a s e d e s t im a tio n , k - n e a re s t n e ig h b o r , a n d o n e class SVM (Support



Vector Machine). O ther researchers [31, 21] apply clustering approaches in unsu

pervised NIDSs. We employ the outlier detection of random forests in unsupervised

anomaly detection.

Supervised anomaly detection has been studied extensively. Supervised anomaly

detection uses attack-free training data to build profiles of normal activities. After

tha t, it uses the deviation from the profiles to detect intrusions. ADAM [8] builds the

profile of normal behavior from attack-free training data and represents the profile

as a set of association rules. At run-tim e, ADAM detects suspicious connections

according to the profile. O ther supervised approaches are also applied to anomaly

detection, such as fuzzy da ta mining and genetic algorithms [26], neural networks

[11, 29], and SVM [34],

Statistical m ethods and expert systems are also applied in supervised anomaly

detection [9]. Statistical m ethods build profiles of user and system normal behavior

by a number of samples. Activities are then compared against the profiles, and

deviations are determ ined as abnormal. Expert systems describe normal behavior of

users and systems by a set of rules, and then apply the rules to detect anomalous

behaviors.

2.2 .3 H ybrid d e tec tio n

Hybrid detection system consists of misuse detection and anomaly detection. It can

detect bo th known and unknown intrusions.

The Next Generation Intrusion Detection Expert System (NIDES) developed by

SRI [7], is a h y b r id in t r u s io n d e te c t io n sy s te m . NIDES performs real-time monitoring

of user activity on multiple target systems connected on a network. It consists of a



misuse detection component as well as an anomaly detection component. The rule-

based misuse component employs expert rules to define known intrusive activities.

The anomaly component is based on statistical approach, and it flags activities as

attacks if they are largely deviant from the expected behaviors. By combining a

statistical component and an expert system component, NIDES increases the chances

to detect intrusions which may be missed by the other.

In our proposed hybrid system, the misuse component uses random forests for

classification in intrusion detection. The anomaly component is based on the outlier

detection provided by random forests.

2.3 D ata m ining based detection

2.3 .1 A D A M

ADAM (Audit D ata Analysis and Mining) [8] is the one of the most widely known

project in the field. It is an on-line network-based IDS. ADAM can detect known

attacks as well as unknown attacks.

ADAM uses association rules in detection. Association rules, one of d a ta mining

algorithms, is easy to understand. It searches for all possible frequent associations

among the set of given features, and usually generates many useless rules th a t cannot

describe effectively user and system activities. The goal of association rules is to

gather necessary knowledge about the nature of the audit data. An association rule

is expressed as:

X =* Y[s,c]

• X and Y are sets of attribute-values



• x n Y = <D

• s (support): percentage of dataset records th a t satisfy the conjunction of X and

Y.

• c (confidence): the conditional probability th a t a record satisfies Y, provided it

satisfies X.

ADAM does not use the packet payload; it only uses the packet header. ADAM uses

T C P connections as the basic item-set. Connections are obtained from raw packet

da ta of an audit trail. The item-set is defined as a 6-tuple:

R(Ts; Src:IP; Src:Port; Dst:IP; Dst:Port; FLAG)

• Ts : the beginning tim e of a connection

• Src:IP: source IP

• Src:Port: source port

• Dst:IP: destination IP

• D st:Port: destination port

• FLAG: sta tus of a TC P connection

The framework of ADAM has two phases: a training phase and an on-line phase. In

the training phase, as shown in Figure 2.2 on the next page, the attack-free training

d ata is fed to a module th a t performs off-line association rule discovery. The output

of this module is a rule-based p ro file o f n o rm a l a c tiv i t ie s . A f te r t h a t , th e p ro d u c e d

profile is inputted to another module called “on-line single level and domain-level



mining” . The module performs a dynamic on-line algorithm for association rules.

The training d a ta containing attacks is fed into the module, and then the module

outputs suspicious hot items. Along with feature selection, the suspicious hot items

are labeled as false alarms or attacks. The labeled da ta is fed into classifier builder

to train the classifier.

In the on-line phase as Figure 2.3 on the next page, the test da ta is fed into

the system. W ith the built profile, the on-line single level and domain-level mining

module can find suspicious hot items. These suspicious items are classified as false

alarms, attacks and unknown attacks by the trained classifier. The unknown attacks

are the suspicious items th a t cannot be classified as false alarms or attacks.

Attack-free training data

Training data

Suspicious item-sets

training

features

Profile

ClassifierbuilderFeature

selection

Off-line single and domain-level mining

On-line single and domain-level mining

Label item- se ts as fa lse alarms or attacks

Figure 2.2: The training phase of ADAM [8]



Test data

Suspicious

item-sets

False alarms,

A ttacks,

Unknown attacks

Profile

classifier

Trained

selection

Feature

Off-line sing le and domain- level mining

Figure 2.3: Discovering intrusions with ADAM [8]

There are some issues th a t need to be solved in ADAM:

• Threshold tuning. It is im portant to obtain good thresholds for declaring a

connection suspicious.

• Profile building.

• Dependency on training data. Obtaining training d a ta is not easy.

Our hybrid system has two phases (on-line phase and off-line phase) similar to ADAM.

However, our system does not need attack-free da ta to detect novel intrusions using

the outlier detection. Attack-free da ta is critical for ADAM. Since high complexity of

the outlier detection, our system detect anomalies in the off-line phase. ADAM can

detect anomalies in on-line phase. Besides, we use random forests algorithm instead

of association rules used by ADAM. Random Forests is more accurate and efficient

on large dataset th an association rules. Association ru le s a re m o re u n d e r s ta n d a b le .



2.3 .2 M A D A M ID

MADAM ID (Mining Audit D ata for Autom ated Models for Intrusion Detection)

[19] is one of the best known da ta mining projects in intrusion detection. It uses

da ta mining algorithms to compute activity patterns from system audit da ta and

extracts predictive features from the patterns. It is an off-line IDS to produce anomaly

and misuse intrusion models. Association rules and frequent episodes are applied

in MADAM ID. Association rules are used to find intra-audit record patterns, and

the frequent episodes algorithm is used to find inter-audit record patterns. However,

MADAM ID heavily relies on intrusion detection expert knowledge. Expert knowledge

is not only used to prune the number of rules produced by association and frequent

episode mining, bu t also used to construct features.

Compared to MADAM ID, our system can detect known intrusions in real time,

but MADAM ID only can detect intrusions in off-line mode. We apply the random

forests algorithm in our system instead of association rules and frequent episodes

algorithms. Although MADAM ID uses d a ta mining techniques, it still has high

reliance on expert knowledge.

2.3 .3 JA M

JAM (Java Agents for M eta-learning) [20] is a distributed, scalable and portable

agent-based da ta mining system. The main target of JAM is fraud and intrusion de

tection in financial information systems. M eta-learning is one of the key techniques to

combine and integrate separately learned classifiers or models. Hence, the distributed

agents can exchange models.

Compared to JAM, our system is a centralized system. The system builds patterns



and detect intrusions in a single central location. Thus, there is no need to m aintain

separate agents scattered on computers a t each location. However, sending all da ta

to a single place will increase the volume of network traffic. The agents in JAM can

process the da ta locally, then they can exchange models.

2.4 D atasets

2.4 .1 D A R P A d ataset

Under the sponsorship of Defense Advanced Research Projects Agency (DARPA) and

Air Force Research Laboratory (AFRL), M IT Lincoln Laboratory has collected and

distributed the datasets for the evaluation of computer network intrusion detection

systems [25, 3]. The DARPA dataset is the most popular dataset used to test and

evaluate a large number of IDSs. The data can be used on both host-based and

network-based systems. An environment was set up to simulate a typical U.S. Air

Force LAN. The raw T C P /IP dum p da ta was acquired from the environment.

The DARPA dataset includes three sets: 1998 DARPA Intrusion Detection Eval

uation D ata Sets, 1999 DARPA Intrusion Detection Evaluation D ata Sets, and 2000

DARPA Intrusion Detection Scenario Specific D ata Sets. The 1998 datasets contain

seven-week training d a ta and two-week test data. The 1999 datasets contain three-

week training da ta and two-week test data. The 2000 datasets contain one-day data

to address specific scenarios.

The attacks in the datasets fall into four categories:

• DoS : Denial of Service, e.g. syn flood.

• R2L: Unauthorized access from a remote machine, e.g. guessing password.


CHAPTER 2. BACKGROUND AND RELATED WORK

Table 2.1: Intrusions in the 1998 DARPA dataset [27]

A tta ck Class OS: Solaris OS: SunOS O SiLinuxDenial o f Service Apache2 Apache2 Apache2

Back Back BackMail bomb Mail bomb Mail bom bNeptune Neptune NeptunePing o f death Ping o f death Ping o f deathProcess table Process table Process tableS m urf S m urf Sm urfSyslogd Syslogd SyslogdUDP storm UDP storm UDP storm

Rem ote to User D ictionary D ictionary D ictionaryFtp-write Ftp-write Ftp-write

Guest Guest GuestPhf Phf ImapXlock Xlock NamedXnsnoop X nsnoop Phf

Sendm ailX lockX nsnoop

User to S uper-user Eject Load m odule PerlF fbconfiq Ps XtermFdform at

Probing Ps Ip sweep Ip sweepIp sweep M scan M scanM scan Nm ap NmapNm ap Saint SaintSaint Satan SatanSatan



• U2R: Unauthorized access to root privileges, e.g., various “buffer overflow”

attacks.

• Probing: surveillance and other probing, e.g., port scanning.

As an example, we show 32 different intrusions in the 1998 datasets listed as

Table 2.1 on the previous page [27].

2.4 .2 K D D ’99 d a taset

The K D D’99 dataset is a subset of DARPA dataset prepared by Sal Stolfo and Wenke

Lee [14]. The d a ta was preprocessed by extracting 41 features from the tcpdum p d a ta

in the 1998 DARPA datasets. The K D D’99 dataset can be used w ithout further time-

consuming preprocessing and IDSs can be compared with each other by working on

this dataset. The 41 features are listed in Table 2.2 [27].

Table 2.2: The features in the K D D’99 dataset [27]

# Feature name Description1 duration Length ( # of seconds) of the connection.2 protocol type Type of the protocol, e.g. tcp, udp, etc.3 service Network service on the destination, e.g.,

h ttp , telnet, etc.4 flag Normal or error status of the connection.5 src_bytes # of d a ta bytes from source to destination.6 dst.bytes # of da ta bytes from destination to source.7 land 1 if connection is from /to the same host/port;

0 otherwise.8 w rongJragm ent # of wrong fragments.9 urgent # of urgent packets.10 h o t ^ o f h o t in d ic a to rs .11 num_failed_logins # of failed login attem pts.12 logged in 1 if successfully logged in; 0 otherwise.



Table 2.2 : The features in the K D D’99 dataset(continued)# Feature name Description13 num_compromised # of compromised conditions.14 root_shell 1 if root shell is obtained;

0 otherwise.15 su_attem pted 1 if su root command attem pted;

0 otherwise.16 num_root # of root accesses.17 num Jile-creations of file creation operations.18 num_shells of shell prompts.19 num_access_files =#= of operations on access

control files.20 num_outbound_cmds $= of outbound commands in an ftp session.21 is_host_login 1 if the login belongs to the hot list;

0 otherwise.22 is _guest Jogin 1 if the login is a guest’ login;

0 otherwise.23 count # connections to the same host as the current

one during past two seconds.24 srvmount # of connections to the same service as the

current connection in the past two seconds.25 serror_rate % of connections th a t have SYN errors

to the same host during past two seconds.26 srv_serror_rate % of connections th a t have SYN errors

to the same service during past two seconds.27 rerror_rate % of connections th a t have REJ errors

to the same host during past two seconds.28 srv_rerror_rate % of connections th a t have REJ errors

to the same service during past two seconds.29 same_srv-rate % of connections to the same service

during past two seconds.30 diff_srv_rate % of connections to different services

during past two seconds.31 srv_diff_host_rate % of connections to different hosts

during past two seconds.32 dst -host -count # of connections to the same host as the

current connection in past 100 connections.33 dst_host_srv_count # of connections to the same service as the

current connection in past 100 connections.34 dst-host-same_srv_rate % of connections to the same service



Table 2.2 : The features in the KDD’99 dataset (continued)# Feature name Description

in past 100 connections.35 dst_host_diff_srv_rate % of connections to different services

in past 100 connections.36 dst_host_same_src_port_rate % of connections from the source port

in past 100 connections.37 dst_host_srv_diff_host_rate % of connections to different hosts

in past 100 connections.38 dst_host_serror_rate % of connections th a t have SYN errors

to the same host in past 100 connections.39 dst_host_srv_serror_rate % of connections th a t have SYN errors

to the same service in past 100 connections.40 dst_host_rerror_rate % of connections th a t have R E J errors

to the same host in past 100 connections.41 dst_host_srv_rerror_rate % of connections th a t have REJ errors

to the same service in past 100 connections.

The K D D’99 dataset includes the full training set, the 10% training set, and

the test set. The full training set has 4,898,431 connections. In TC P protocol, a

connection is established before two hosts on networks can communicate with each

other. After finishing sending data, the connection is closed. Thus, a T C P connection

has m ultiple packets. For UDP protocol, each connectionless packet is also treated

as a connection. The 10% training set has 494,020 connections. The 10% training

set contains all the minority classes (U2R and R2L) of the full training set and part

of the m ajority classes (Normal, DoS, and Probing). The test set contains 311,029

connections.


Chapter 3

M isuse detection

In this chapter, we describe our approach to apply the random forests algorithm in

misuse detection. We first describe the architecture of the proposed misuse-based

NIDS (Network Intrusion Detection Systems). Then, we illustrate our solutions to

build detection patterns with the high performance for intrusion detection. Finally,

we discuss our experim ental results.

3.1 M ining patterns o f intrusions

In this section, we first describe the architecture of the NIDS, and then illustrate our

solutions for imbalance intrusions, feature selection, and optim ization of the random

forests algorithm.

3.1 .1 O verview o f th e fram ew ork

The proposed framework applies d a ta mining techniques to build patterns for network

intrusion detection. The architecture of the proposed NIDS is shown in Figure 3.1.

25


CHAPTER 3. MISUSE DETECTION 26

There are two phases in the framework: an off-line phase and an on-line phase. The

system builds patterns of intrusions in the off-line phase and detects intrusions in the

on-line phase.

In the off-line phase, labeled training da ta is fed into the off-line preprocessor

module. After preprocessing, feature vectors are stored in database. The P a tte rn

Builder module retrieves the training data from the database and builds the patterns

of intrusions. The P a tte rn Builder module employs the feature selection algorithm,

handles imbalanced intrusions, and builds the patterns by the random forests algo

rithm with optim al param eters. After mining the patterns of intrusions, the patterns

are deployed to the Detector module.

Featurvectors

Packets AlarmsNetwork

Patterns On line

O fflineFeaturevectors

Alarmer

DataSet

Sensors

Audited -i data Detector

PatternBuilder

Database (Off line)

On-line Preprocessors

Training data r

Off-line Preprocessor

Database (On line)

Figure 3.1: Architecture of the misuse based NIDS

In the on-line phase, the sensors capture the packets from network traffic. Each

sensor is installed on each network segment. Each sensor can capture all traffic on



the same network segment. The features for each connection are constructed by

the on-line preprocessors from the captured network traffic. The connections are

stored in the database and can be retrieved by the Detector module. Then, in the

Detector module, the connections are classified as intrusions or normal traffic using

the patterns built in the off-line phase. Finally, the system raises an alert when it

detects any intrusion.

3.1 .2 O p tim iza tion for random forests

The error rate of a forest depends on the correlation between any two trees and

the strength of each tree in the forest. Increasing the correlation increases the error

rate of the forest. The strength of a tree is determined by the error rate of the

tree. Increasing the strength decreases the error rate of the forest. W hen the forest is

growing, random features are selected at random out of the all features in the training

data. The best split on these random features is used to split the node of the tree.

The number of random features (Mtry) is held constant. Reducing (Increasing) Mtry

reduces (increases) both the correlation and the strength. The number of features

employed in splitting each node for each tree is the prim ary tuning param eter (Mtry).

To improve the performance of random forests, this param eter should be optimized.

We use training da ta to find the optim al value of the param eter Mtry. The

minimum error rate corresponds to the optim al value. Therefore, we use the different

values of Mtry to build the forests, and evaluate the error rates of the forests. Then,

we select the value corresponding to the minimum error rate to build the pattern.

T h e r e a re tw o w ay s to e v a lu a te th e e r r o r r a te . O n e is to s p l i t t h e d a t a s e t in to

training part and test part. We can employ the training part to build the forest,



and then use the test part to calculate the error rate. Another way is to use the oob

(out of bag) error estim ate. Because the random forests algorithm calculates the oob

error during the training phase, we do not need to split the training data. We choose

the oob error estim ate, since it is more effective by learning from the whole training

dataset.

3.1 .3 Im balanced in trusions

Intrusions are imbalanced. In other words, some intrusions produce much more con

nections th an others. The random forests algorithm tries to minimize the overall error

rate, by lowering the error ra te on m ajority classes (e.g., m ajority intrusions) while

increasing the error rate of minority classes (e.g., m inority intrusions) [12]. However,

the damage cost of the minority intrusions is much higher than the cost of the m a

jority intrusions. Thus, for imbalanced intrusions, we need to improve the detection

rate of the minority intrusions while m aintaining a reasonable overall detection rate.

There are two solutions to deal w ith the imbalanced intrusions problem. One is

to set different weights for different intrusions. The minority intrusions are assigned

higher weights. Although the overall error rate goes up, the error rate of the minority

intrusions will be reduced. The random forests algorithm supports this m ethod by

changing the weight param eters. The other m ethod is to use sampling techniques:

over-sampling the minority intrusions and down-sampling the m ajority intrusions.

Since the network traffic is huge, down-sampling the m ajority intrusions (e.g., nor

mal traffic and Denial of Service) can speed up building the patterns significantly

b y re d u c in g th e s ize o f th e d a ta s e ts . O v e r -s a m p lin g m in o r i ty in tr u s io n s ( e . g . , U ser

to Root and Remote to Local) can raise their weights to decrease their error rate.



Therefore, we combine over-sampling and down-sampling in our NIDS to solve the

imbalanced intrusions problem instead of the first solution.

3 .1 .4 F eature se lec tion

The raw audit d a ta of network traffic is not suitable for intrusion detection. Hence,

feature construction is needed to extract a set of features which can detect intrusions

effectively. Usually, the construction is based on each connection. There are three

types of features for network connection records used in NIDSs [19]:

• Intrinsic features. Intrinsic features describe the basic information of connec

tions, such as the duration, service, source and destination host, port, and flag.

• Traffic features. These features are based on statistics, such as number of con

nections to the same host as the current connection w ithin a tim e window.

• Content features. These features are constructed from the payload of traffic

packets instead of packet headers, such as number of failed logins, whether

logged in as root, and number of accesses to control files.

Feature selection is one of the critical steps in building NIDSs. The number of

intrinsic features is fixed, since the number depends on the information of packet

header. However, traffic features and content features can be constructed using dif

ferent methods. Hundreds of traffic and content features can be designed, while only

some of them are essential for separating intrusions from normal traffic. Unessential

features not only increase com putational cost, bu t also increase the error rate, es

pecially for some algorithms th a t are sensitive to the number of features. “Deciding

upon the right set of features is difficult and tim e consuming” [20]. Currently, features



are designed by security experts. Thus, we need an approach th a t can autom ate the

feature selection. We employ variable im portance calculated by the random forests

algorithm in feature selection. The features w ith higher value of variable im portance

have more effect on classification. Therefore, we choose the features with the higher

value of variable im portance in the NIDS.

3.2 E xperim ents and results

In this section, we summarize our experimental results to build patterns for intrusion

detection over the K D D’99 dataset. We first describe the experiments using sampling

techniques for imbalanced intrusions and the random forests algorithm to select fea

tures. Then, we present the experiments on param eter optim ization for the random

forests algorithm, distribution of error rates, and speed performance of detection. Fi

nally, we evaluate our approach and compare our results w ith the best result of the

KDD’99 contest [14],

3.2.1 D a ta se t and prep rocessin g

The K D D’99 dataset can be used w ithout further time-consuming preprocessing and

different NIDSs can compare w ith each other by working on the same dataset. There

fore, we carry out our experiments on the KDD’99 dataset and compare our results

with the best result of the KDD’99 contest.

The 10% training set of the KDD’99 contains all the minority classes such as U2R

(User to Root) and R2L (Remote to Local) and part of the m ajority classes in the

full training set. It is just like down-sampling the m ajority classes such as Normal,


CHAPTER 3. MISUSE DETECTION

Table 3.1: Numbering of the attack categories [14]

31

0 Normal1 Probe2 DoS3 U2R4 R2L

DoS (Denial of Service), and Probing. Hence, we just use the 10% training dataset

in our experiments. The task of the KDD’99 contest was to build a classifier capable

of distinguishing between four kinds of intrusions and normal traffic numbered as one

of five classes (see Table 3.1).

3.2 .2 P erform ance com parison on balan ced and im balanced

d ataset

The original dataset (the 10% training set) is imbalanced (e.g., DoS has 391,458

connections bu t U2R only has 52 connections). To make a balanced training set, we

down-sample the Normal and DoS classes by random ly selecting 10% of connections

belonging to Normal and DoS from the original dataset. We also over-sample U2R

and R2L by replicating their connections. The balanced training set with 60,620

connections is much smaller than the original one.

The first experiment is to compare the performance of detection between the pa t

terns built on the original training set and the balanced training set w ith sampling.

The experiment is carried out by using the default values of the param eters for the

random forests algorithm in the W EKA (Waikato Environment for Knowledge Analy

sis) [5]: 66% samples as training data, 34% samples as test data, 10 trees in the forest,

and 6 random features to split the nodes. The main objective of the experiment is to



Table 3.2: Performance on the balanced dataset compared to the original datasetPerformance Original dataset Balanced dataset

Overall error rate 1.92% 0.05%Time to build pattern 1975 seconds 65 seconds

True positive rate(C lass 0) 0.948 0.999True positive rate(C lass 1) 0.989 0.994True positive rate(C lass 2) 1 1True positive rate(C lass 3) 0.862 1True positive rate (Class 4) 0.83 1False positive rate(C lass 0) 0.011 0False positive rate(Class 1) 0 0False positive rate(C lass 2) 0 0False positive rate(C lass 3) 0 0False positive rate(C lass 4) 0.01 0

compare the performance differences between the balanced and the original datasets,

but not to compare the effect of the param eters. As a result, for the sake of conve

nience, we just use the default values of the param eters for bo th datasets. Table 3.2

lists overall error rate for classification, tim e to build pattern , true positive ra te for

all classes, and false positive rate for the classes. As shown in the table, the sampling

techniques can improve the performance, especially for the detection rate (true pos

itive rate) of the minority classes (Class 3 and Class 4) and can reduce the tim e to

build the patterns dramatically.

3 .2 .3 S e lec tion o f im p ortan t featu res

The second experiment is to select the most im portant features. There are 41 features

in the K D D’99 dataset numbered from 1 to 41. They cover all three types of features

in NIDSs: intrinsic features, traffic features, and content features. We employ the

feature selection algorithm supported by the random forests algorithm to calculate



the value of variable im portance. To estim ate the im portance of the variable m, the

number of votes for the correct class is counted using the oob cases in every tree.

Then, the number of the correct votes is counted again after random ly perm uting

the values of variable m in the oob cases. The average of the margin between these

two numbers over all the trees in the forest is the raw im portance score for variable

m. The raw score is divided by its standard error to get a z-score, and the value

of variable im portance is the negative z-score for variable ro. Figure 3.2 on the next

page plots the values of variable im portance for all five categories, sorted in decreasing

order. The features are listed in Table 2.2 on page 22. The figure shows the variable

im portance values of the last 3 features (Feature 7, 20, and 21) are much less than

other values. Therefore, we select the rest of the 38 most im portant features to build

the patterns for intrusion detection.

Feature 3 (service type such as h ttp , telnet, and ftp) is the most im portant feature

to detect intrusions. It means th a t the intrusions are sensitive to service type. Feature

7 (land) is used to indicate if a connection is from /to the same host. According to the

domain knowledge, it is the most discrim inating feature for land attacks. However,

land attacks belong to DoS and have much fewer connections than other types of DoS.

After down-sampling DoS attacks, the land attacks are almost excluded from the

balanced dataset. Therefore, the feature 7 is not im portant to improve the detection

rate of DoS attacks. Feature 20 ( # of outbound commands in a F T P session) and

21 (hot login to indicate if it is a hot login) do not show any variation for intrusion

detection in the training set.

The above analysis suggests th a t the feature selection can help choose features to

detect intrusions without special domain knowledge. However, the m ethod has high



dependence on training sets.

©L_15

©

Impor tance

-10 o 10 20

Figure 3.2: Variable im portance of the features in the misuse approach experiment

3 .2 .4 P aram eter op tim iza tio n for random forests

To improve the detection rate, we optimize the number of the random features (Mtry).

We build the forest with different Mtry (5, 10, 15, 20, 25, 30, 35, and 38) over the

balanced training set, then plot the oob error rate and the tim e to build the pattern



corresponding to different value for Mtry. As Figure 3.3 shows, the oob error rate

reaches the minimum when Mtry is 15, 25, or 30. Besides, increasing Mtry increases

the tim e to build the pattern . Thus, we choose 15 as the optim al value, which reaches

the minimum of the oob error rate and costs the least tim e among these three values.

6000.00215O ob E rro r R ate

0 .0 0 2 1 -

0.00205

0. 002 -

jjg 0.00195

-- 300 E0.0019 -0.001850.0018 -

0.00175

0.0017 -0.00165

0 15 20 25 30 35 38M t r y

Figure 3.3: Performance with different values for param eter Mtry of random forests

3.2 .5 D istr ib u tio n o f error rates

To build patterns of intrusions, the random forests algorithm samples cases a t random

with replacement. The features to split the nodes of trees are also selected at random.

Thus, the oob error rate is different a t each run, even with the same value of Mtry.

For this reason, we carry out an experiment to analyze the distribution of the oob

error rate.



Listing 3.1 The pseudo-code of the program for the experiment on the distribution

of the error rates

Get the filename of dataset Construct instances from the file Set labels for instancesFOR each Mtry in (5, 10, 15, 20, 25, 30, 35, 38)

FOR each seed in (1 .. 20)Set the number of trees, seed and Mtry for random forests module Build pattern using the instances Calculate the oob error rate

END FOR END FOR

0.0024

0.0023

0.0022

0.0021

0.002imm

£L_® 0.0019 £O° 0.0018

0.0017

0.0016

0.0015

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

No. of m n

*Mtiy = 5

- Mtiy =10

-Mtiy = 15

Mtiy = 20

- Mtiy = 25

-Mtiy =30

-Mtiy =35

-Mtiy =38

Figure 3.4: Distribution of the oob error rate

We developed a Java program for this experiment. The pseudo-code of the pro

gram is shown in Listing 3.1. The random function in Java only generates pseudo



random number. W ith the same seed, we will get the same result for each run.

Therefore, we need to set different seed for the random function at each run.

In this experiment, we use the balanced dataset as training data and set the

number of trees as 50. The experimental result is shown in Figure 3.4 on the previous

page. The figure shows th a t the oob error rate is different a t each run, bu t the change

is not significant. For example, the error rate keeps low when the value of Mtry is 15.

Figure 3.5 shows the average of the oob error rates. The average of the error rates

reaches the minimum when the value of Mtry is 15.

0.0021 n

0.00205

I 0.002 4 -

I 0.00195 —

j= 0.0019 —

* 0.00185 -H

| 0.0018 —

0.00175

0.001710 15 20 25

Mtry30 35 38

Figure 3.5: Average oob error rate for different Mtry

3 .2 .6 S p eed perform ance o f d e tec tio n

Since network traffic is huge, the high speed performance of detection is im portant

for NIDSs. A large volume of network traffic may overwhelm a NIDS with low speed

performance. The detection speed also determines the deployment of NIDSs. The



proposed NIDS is a centralized system. The captured data comes from the sensors

installed in different network segments. Thus, the number of the sensors depends on

the speed performance of the NIDS.

We carry out an experiment to measure the speed performance of detection. We

developed a Java program for this experiment. The pseudo-code of the program is

shown in Listing 3.2.

Listing 3.2 The pseudo-code of speed measurement program

Get the filename of dataset Construct instances from the file Set labels for instancesFOR each number of trees in (10, 20, 30, 40, 50)

FOR each Mtry in (5, 10, 15, 20, 25, 30, 35)Set the number of trees and Mtry for random forests moduleBuild patterns using the instancesGet the starting time of the classificationFOR each instance

Classify the instance using the built patterns END FORGet the ending time of the classification Total time = end time - start timeTime to process each connection = total time / the number of the instances

END FOR END FOR

The run tim e experimental environment is listed as follows: Dataset(60,620 in

stances), CPU (Pentium (R) 4, 3.00GHz), RAM (0.99 GB), Language (Java), JVM

(version 1.4.1).

The experimental result is shown in Figure 3.6 on the next page. The figure shows

the speed has more dependency on the number of trees in the forest than the value of

Mtry. Increasing the number will increase time to process each connection. Increasing

the value of Mtry will decrease the tim e slightly. The average tim e is about 0.014



milliseconds for each connection. Thus, the Detector Module of the NIDS can process

about 71,655 connections per second in the experimental environment.

Os *.2 a) * § 9 °

is

0.03

0.025

0.02

0.015

0.01

0.005

05 10 15 20 25 30 35

10 trees

• 20 trees

- 30 trees

•40 trees

■ 50 trees

Mtiy

Figure 3.6: Speed measurement of detection

3.2 .7 E valu ation and d iscu ssion

Different misclassifications have different levels of consequences. For example, mis-

classifying R2L as Normal is more dangerous than misclassifying DoS as Normal. We

use the cost m atrix as Table 3.3 on the next page published in the KDD’99 [14] to

measure the damage of misclassification. My denotes the number of samples in Class

i misclassified as Class j, and Cy indicates the corresponding cost in the cost ma

trix. Let N be the to ta l number of the samples. The cost th a t indicates the average



Table 3.3 : Cost m atrix 14]Normal Probe DoS U2R R2L

Normal 0 1 2 2 2Probe 1 0 2 2 2DoS 2 1 0 2 2U2R 3 2 2 0 2R2L 4 2 2 2 0

Table 3.4: Performance comparison on the K D D’99 datasetExperim ents Overall error rate Cost Time(Seconds)Best KDD Result 7.29% 0.2331 Not providedExperim ent without feature selection 7.19% 0.2306 491Experim ent with feature selection 7.07% 0.2282 423

damage of misclassification for each connection is computed as:

cost = E Mij x C n / N (3.1)

Similar to the KDD’99 contest, we evaluate our approach with the same test

dataset which contains 311,029 examples. We carry out our experiment with 50

trees and 15 random features (optimized in the previous experiments). First, we

build patterns on the balanced training set using all the 41 features. Then, we build

patterns using the 38 most im portant features. The evaluation results of the patterns

are reported in Table 3.4 along with the best result of the KD D’99 contest. The

overall error ra te is the ratio of the misclassified connections to the to ta l connections

in the test set.

There were 24 participants in to ta l in the KDD’99 contest [14]. The best result

of the K D D’99 contest is listed in Table 3.4, achieved by an ensemble of decision

trees. The experimental results show th a t our approach provides lower overall error

rate and cost compared to the best the K D D’99 result even without feature selection.



The results also show th a t the overall error rate, cost, and tim e to build patterns is

reduced by selecting the most im portant features. Thus, feature selection can improve

the performance of intrusion detection.

3 .2 .8 Im p lem en tation

In our experiments, we use the random forests algorithm implemented in the W EKA

(Waikato Environment for Knowledge Analysis) [5]. The W EKA is a Java package

which contains machine learning algorithms for d a ta mining tasks. However, the

W EKA does not implement the variable im portance function in the random forests

algorithm. Therefore, we also use the FORTRAN 77 program [2] developed by Leo

Breiman and Adele Cutler to calculate the variable im portance for feature selection.

We also developed a tool for experiments on the distribution of error rates, and

speed performance of detection. W ithout the tool, we have to run the W EKA many

times for each of the experiments. Each run involves many manual operations, for

example, setting param eters. W ith the tool, we can carry out these two experiments

automatically, instead of doing them manually.

3.3 Sum m ary

In this chapter, we employ the random forests algorithm in misuse-based NIDSs

to improve detection performance. To increase the detection ra te of the minority

intrusions, we build the balanced dataset by over-sampling the minority classes and

down-sampling the m ajority classes. The random forests algorithm can build patterns

more efficiently over the balanced dataset, which is much smaller than the original



one. The experiments have shown th a t the approach can reduce the tim e to build

patterns dram atically and increase the detection rate of the minority intrusions.

Instead of selecting features based on the domain knowledge, we select features

autom atically according to their variable im portance calculated by the random forests

algorithm. By the feature selection algorithm, deciding upon the right set of features

has become easy and autom ated. Although the approach reduces the dependency on

the domain knowledge in feature selection, it increases the dependency on training

sets.

From the experiments on various values of random features, we obtain the optimal

value to improve the performance of random forests. The evaluation on the KDD’99

test set shows the performance provided by our approach is be tter than the best

result of the KDD’99 contest (reduction of the overall error rate and the cost of

misclassification).


Chapter 4

A nom aly detection

In this chapter, we apply the random forests algorithm in anomaly detection. We

describe the framework of the anomaly-based NIDS (Network Intrusion Detection

Systems) and illustrate the approach to detect outliers using the random forests

algorithm. Finally, we discuss our experimental results.

4.1 D etectin g outliers

In this section, we first present the overview of the proposed framework of the NIDS.

Then, we describe how to build patterns of network services and discuss our un

supervised approach to detect outliers using the random forests algorithm. In the

unsupervised approach, there is no need for attack-free training data, while super

vised approaches need attack-free da ta to build profiles of normal activities.

43


CHAPTER 4. ANOMALY DETECTION 44

4 .1 .1 O verview o f th e fram ew ork

The proposed framework applies the random forests algorithm to detect novel in tru

sions. The framework is shown in Figure 4.1. The NIDS captures the network traffic

and constructs dataset by pre-processing. After tha t, the service-based patterns are

built over the dataset using the random forests algorithm. W ith the built patterns,

we can find outliers related to each pattern . Then the system will raise alerts when

outliers are detected. After capturing network traffic, the processing is off-line. Due

to the high com putational requirements of the outlier detection algorithm, on-line

processing is not suitable in real network environment.

Dataset AlertsOutlier

DetectionN etwork Traffic

Pre-Processing

PatternBuilding

Figure 4.1: The framework of the unsupervised anomaly NIDS

4.1 .2 M in in g p a ttern s o f netw ork serv ices

Network traffic can be categorized by services (e.g., h ttp , telnet, and ftp). Each

network service has its own pattern . Therefore, we can build patterns of network

services using random forests algorithm. However, the random forests algorithm is

supervised, so we need datasets labeled by network services. Since the information of

network services is in network packets, network traffic can be labeled by the services

autom atically instead of time consuming manual processing. Actually, many datasets

used to evaluate NIDSs can be labeled by network services w ith a little effort. For



example, one of features in the KD D’99 dataset is service type, which can be used as

label.

Before building the patterns, we need to optimize the param eters of the random

forests algorithm. The number of features employed in splitting each node for each

tree is the prim ary tuning param eter (M try). To improve the performance of the

random forests algorithm, this param eter should be optimized. Another param eter

is the number of trees in a forest.

We use the dataset to find the optim al value of the param eter Mtry and the

number of the trees. The minimum error rate corresponds to the optim al values.

Therefore, we use the different values of Mtry and the number of the trees to build

the forests, and evaluate the error ra te of each forest. Then, we select the values

corresponding to the minimum error rate to build the patterns of the services.

4 .1 .3 U n su p erv ised ou tlier d e tec tio n

We can detect intrusions by finding unusual activities or outliers. There are two

types of outliers in the proposed NIDS. The first type is an activity th a t deviates

significantly from others in the same network service. The second type is an activity

whose pa tte rn belongs to the services other than their own service. For instance, if an

h ttp activity is classified as ftp service, the activity will be determined as an outlier.

The random forests algorithm uses proximities to find outliers whose proximities

to all other cases in the entire d a ta are generally small. The proximities are one of the

most useful tools in random forests [12]. After the forest is constructed, all cases in

the dataset are put down each tree in the forest. If cases k and n are in the same leaf

of a tree, their proximity is increased by one. Finally, the proximities are normalized



by dividing by the number of the trees.

For a dataset with N cases, the proximities originally formed a N xN matrix.

The complexity of calculation is N xN . D atasets of network traffic are huge, so the

calculation needs a lot of memory and CPU time. To improve the performance,

we modify the algorithm to calculate the proximities. As we mentioned above, if a

service activity is classified as another service, it will be determ ined as an outlier.

Therefore, we do not care about the proximity between two cases th a t belong to

different services. Si denotes the number of cases in service i. The complexity will be

reduced to ]T) Si x Si after the modification.

W ith respect to the random forests algorithm, outliers can be defined as the cases

whose proximities to other cases in the dataset are generally small [12]. Outlier-ness

indicates a degree of being an outlier. It can be calculated over proximities, class(k)

= j denotes th a t k belongs to class j. prox(n,k) denotes the proximity between cases

n and k. The average proximity from case n in class j to case k (the rest of d a ta in

class j ) is computed as:

c l a s s ( k ) —j

N denotes the number of cases in the dataset. The raw outlier-ness of case n is

defines as:

In each class, the median and the absolute deviation of all raw outlier-ness are

calculated. The median is subtracted from each raw outlier-ness. The result of the

subtraction is divided by the absolute deviation to get the final outlier-ness. If the

(4.1)

N /P (n) (4.2)



outlier-ness of a case is large, the proximity is small, and the case is determined as

an outlier.

To detect outliers in a dataset of network traffic, we build patterns of services

over the dataset. Then, we calculate the proximity and outlier-ness for each activity.

An activity th a t exceeds a specified threshold of outlier-ness will be determined as an

outlier.


In this section, we summarize our experimental results to detect intrusions using

the unsupervised outlier detection technique. We first describe the datasets used in

the experiments. Then we evaluate our approach in a way similar to the way used

in [16, 21]. To evaluate the detection performance of our approach under different

number of attacks, we also carry out the experiments over different datasets. The

performance to detect the minority intrusions by our approach is evaluated by another

experiment.

4.2 .1 D a ta se t and prep rocessin g

The full training set, one of the K D D’99 datasets, has 4,898,431 connections. The

dataset contains attacks. The dataset is labeled by type of attacks. Since our ap

proach is unsupervised, the dataset does not satisfy the needs of our experiments.

We m ust remove the labels th a t indicate types of attacks from the dataset.

To generate new datasets for our experiments, we first separate the dataset into

two pools according to the labels. One contains normal connections. Another contains



intrusive connections. Then, we remove all the labels from the pools. However, we

need the d a ta labeled by services to build patterns of services, so we use the service

feature in the dataset as label. As a result, all the data contains 40 features and is

labeled by services.

For our experiments, we choose five most popular network services: ftp, h ttp ,

pop, smtp, and telnet. By selecting ftp, pop, telnet, 5% h ttp , and 10% sm tp normal

connections, we generate a dataset called the normal dataset, which contains 47,426

normal connections. Finally, by injecting anomalies from the pool of attacks into the

normal dataset, we generate four new datasets: 1%, 2%, 5%, and 10% datasets. 1%

(2%, 5%, and 10%) dataset means th a t 1% (2%, 5%, and 10%) of connections in the

dataset are attacks.

4 .2 .2 E va lu ation and d iscu ssion

We carry out the first experiment over the 1% attack dataset in a similar way used

in [16, 21], We first optimize the param eters (Mtry and the number of trees) of the

random forests algorithm by feeding the dataset into the NIDS. The NIDS builds

patterns of the network services with different values of the param eters, and then

calculates the oob error rates. The oob error rates are shown in the th ird column of

Table 4.1. The values of the param eters (the number of trees and M try ) corresponding

to the lowest oob error rate are optimized.

Table 4.1: The oob error rates for param eter optim ization in the anomaly detection experiments

Trees M try Error rate for 1%

Error rate for 2%

Error rate for 5%

Error rate for 10%

Error rate for minority

10 5 0.00886 0.01323 0.01884 0.03237 0.00187



Table 4.1 : The oob error rates for param eter optim ization(continued)Trees M try Error rate

for 1%Error rate

for 2%Error rate

for 5%Error rate

for 10%Error rate

for minority10 10 0.00745 0.01137 0.01885 0.03242 0.0018710 15 0.00728 0.01084 0.01889 0.0325 0.0018810 20 0.00673 0.01038 0.01889 0.03252 0.0018810 25 0.0065 0.01012 0.0189 0.03255 0.0018910 30 0.00624 0.01007 0.0189 0.03256 0.0018910 35 0.00625 0.0099 0.0189 0.03258 0.0018910 40 0.00631 0.00986 0.01892 0.03263 0.0018915 5 0.00874 0.01306 0.01895 0.03266 0.001915 10 0.00748 0.0112 0.01895 0.03267 0.001915 15 0.00718 0.01065 0.01895 0.03269 0.0019115 20 0.00665 0.01008 0.01896 0.03275 0.0019115 25 0.00637 0.00997 0.01896 0.03281 0.0019115 30 0.00615 0.00987 0.01898 0.03282 0.0019215 35 0.0062 0.00972 0.01899 0.03295 0.0019415 40 0.00615 0.00978 0.019 0.03299 0.0019720 5 0.00894 0.01299 0.01901 0.03302 0.0019720 10 0.00753 0.0112 0.01901 0.03309 0.0019920 15 0.00714 0.01085 0.01902 0.0331 0.00220 20 0.0067 0.0101 0.01902 0.0331 0.0020220 25 0.00647 0.00993 0.01903 0.0332 0.0020320 30 0.00622 0.00979 0.01906 0.03322 0.0020320 35 0.00614 0.00967 0.01907 0.03351 0.0020520 40 0.00612 0.0097 0.01908 0.03356 0.0020825 5 0.00884 0.01301 0.0191 0.03361 0.0020925 10 0.00746 0.01121 0.01911 0.03361 0.0020925 15 0.00701 0.01089 0.01912 0.03363 0.0020925 20 0.00658 0.01014 0.01912 0.03368 0.0021125 25 0.00637 0.00988 0.01914 0.03371 0.0021225 30 0.0061 0.00978 0.0192 0.03374 0.0021225 35 0.00605 0.00976 0.01921 0.03389 0.0021325 40 0.00602 0.00968 0.01923 0.03402 0.0022430 5 0.00898 0.01294 0.01924 0.03406 0.0022530 10 0.00738 0.01128 0.01953 0.03412 0.0022630 15 0.00698 0.01091 0.01966 0.03415 0.0022630 20 0.00652 0.01023 0.01966 0.03422 0.0022630 25 0.00635 0.00996 0.01969 0.03424 0.00227



Table 4.1 : The oob error rates for param eter optimization(continued)Trees M try Error rate

for 1%Error rate

for 2%Error rate

for 5%Error rate

for 10%Error rate

for minority30 30 0.00613 0.00979 0.01971 0.03425 0.002330 35 0.0061 0.00983 0.01973 0.0343 0.002330 40 0.00608 0.00976 0.01979 0.03432 0.0023235 5 0.00906 0.01297 0.0198 0.03439 0.0023635 10 0.00741 0.01135 0.01982 0.03441 0.0023735 15 0.00702 0.01096 0.01997 0.03541 0.0023735 20 0.00662 0.01025 0.02042 0.03548 0.0023935 25 0.00643 0.00996 0.02044 0.03551 0.0024135 30 0.00621 0.0098 0.02045 0.03556 0.0024135 35 0.00616 0.00987 0.02045 0.0356 0.0024235 40 0.00616 0.00982 0.02046 0.03562 0.0024340 5 0.00913 0.01298 0.02049 0.03563 0.0024440 10 0.00752 0.01138 0.02049 0.03573 0.0024740 15 0.00704 0.01092 0.0205 0.03585 0.002540 20 0.00661 0.01031 0.02053 0.03603 0.0026140 25 0.00642 0.01002 0.02086 0.03707 0.0026840 30 0.0062 0.00989 0.02115 0.03753 0.0027640 35 0.00617 0.0099 0.02122 0.03754 0.0027640 40 0.00615 0.00986 0.02125 0.0376 0.0027645 5 0.00907 0.01309 0.02125 0.03763 0.002845 10 0.00752 0.0115 0.02126 0.03765 0.0028345 15 0.00704 0.01086 0.02127 0.03765 0.0028345 20 0.00658 0.01035 0.02134 0.03766 0.0028445 25 0.00639 0.01004 0.02136 0.0377 0.0028945 30 0.00616 0.00989 0.02137 0.0378 0.0030545 35 0.00615 0.00991 0.02306 0.03985 0.0044645 40 0.00613 0.00987 0.02324 0.03986 0.0044850 5 0.00912 0.01306 0.02325 0.04008 0.0044950 10 0.00753 0.01149 0.02328 0.04021 0.004550 15 0.00703 0.01083 0.0233 0.04038 0.0045350 20 0.00656 0.01034 0.02337 0.0405 0.0045850 25 0.00638 0.00996 0.0234 0.0405 0.004650 30 0.00618 0.00988 0.02343 0.04051 0.0046150 35 0 .00615 0 .0099 0 .02344 0 .04052 0.0047150 40 0.00615 0.00984 0.02346 0.04065 0.00476



W ith the optimized param eters (the number of trees is 25, the value of Mtry is

40), we build the patterns of the network services. Over the built patterns, the NIDS

calculates the outlier-ness of each connection. Figure 4.2 plots the outlier-ness of the

1% attack dataset. Since the attacks are injected at the beginning of the dataset,

the figure shows the outlier-ness of the attacks is much higher than most of normal

activities. Some normal activities also have high outlier-ness. T ha t leads to false

positives. The NIDS will raise an alert if an outlier-ness of a connection exceeds a

specified threshold.

Connections

Figure 4.2: The outlier-ness of the 1% attack dataset

We evaluate the performance of our system by the detection rate and the false

positive rate. The detection rate is the number of attacks detected by the system

divided by the number of attacks in the dataset. The false positive rate is the number



of normal connections th a t are misclassified as attacks divided by the number of

normal connections in the dataset. We can evaluate the performance by varying the

threshold of outlier-ness.

In intrusion detection, ROC (Receiver Operating Characteristic) curve is often

used to measure performance of IDSs. The ROC curve is a plot of the detection rate

against the false positive rate. Figure 4.3 plots ROC curve to show the relationship

between the detection rates and the false positive rates over the dataset.

100

F alse p ositive rate (%)

Figure 4.3: The ROC curve for the 1% attack dataset

The result indicates th a t our system can achieve a high detection rate with a low

false positive rate. Compared to other unsupervised anomaly based systems [16, 21],

our system has better performance over the K D D’99 dataset while the false positive

rate is low. Table 4.2 on the next page lists some results from Eskin, et al. [16].

They carry out the experiments over a dataset which contains 1 to 1.5% attacks and

98.5 to 99% normal instances. The results from the others show the detection rates



Table 4.2: The performance of each algorithm over the KDD’99 dataset [16]Algorithm Detection rate False positive rateCluster 93% 10%Cluster 66% 2%Cluster 47% 1%Cluster 28% 0.5%K-NN 91% 8%K-NN 23% 6%K-NN 11% 4%K-NN 5% 2%SVM 98% 10%SVM 91% 6%SVM 67% 4%SVM 5% 3%

are reduced significantly when the false positive rate is low (below 1%). Although

our experiments are carried out under different conditions, Figure 4.3 on the previous

page shows our system still remains relatively high detection rates when the false

positive rate is low. For example, the detection rate is 95% when the false positive

rate is 1%. W hen the false positive rate is reduced to 0.1%, the detection rate is still

over 60%.

4 .2 .3 E xp erim en ts on th e d e tec tio n perform ance over differ

ent d a ta sets

To evaluate our system under different number of attacks, we carry out the experi

ments over the 1%, 2%, 5%, and 10% attack dataset.

We first optimize the param eters (Mtry and the number of trees) of the random

forests algorithm by feeding the datasets into the NIDS. The NIDS builds patterns of

the network services with different values of the param eters, and then calculates the


CHAPTER 4. ANOMALY DETECTION

Table 4.3: The optim al param eters of random forests

54

D ataset Trees M try1% dataset 25 402% dataset 20 355% dataset 10 510% dataset 10 5

oob error rates. The oob error rates are shown in Table 4.1 on page 48. The values

corresponding to the lowest oob error rate are optimized. The optim al param eters

for each dataset are listed as Table 4.3.

W ith the optimized param eters, we build the patterns of the network services.

Over the built patterns, the NIDS calculates the outlier-ness of each connection.

Figure 4.4, Figure 4.5 on the next page, and Figure 4.6 on the next page plot the

outlier-ness of the 2%, 5%, and 10% attack dataset. Figure 4.7 on page 56 plots the

ROCs for each dataset. The result shows th a t the performance tends to be reduced

with the increasing number of attacks. Thus, the performance of anomaly detection

depends on the proportion of attacks in datasets.

30

25

20 -

15 -c

I 10■5O K

j L l j wlm

-5 J 2503 5005 7507 10009 12511 15013 17515 20017 22519 25021 27523 30025

C o n n e c t i o n s



CHAPTER 4. ANOMALY DETECTION



,— 1

2 4 8 6 4871 746 6 9 941 12426 14811 17396 19881 2 2 3 6 6 24861 27336 29821

C o n n e c tio n s




9 5 -

90 - ' /> — &------o — -o- - — -o - — o — -e - -8 5 -

8 0 -

7 5 -1 % a t t a c k s

7 0 -- - 2 % a t t a c k s

— 5 % a t t a c k s6 5 -

6 0 -1 0 % a t t a c k s

5 5 -

5 0

F a l s e p o s i t i v e r a t e

Figure 4.7: The ROC curves for the different datasets

4 .2 .4 E xp erim en t on th e d e tec tio n perform ance over m inor

ity in trusions

M inority intrusions are more difficult to detect than m ajority intrusions. Since mi

nority intrusions have much fewer connections, the above experiments cannot show

the performance of detecting minority intrusion by the NIDS. Therefore, we carry

out an experiment to evaluate the performance to detect m inority intrusions using

outlier detection.

B y in je c t in g m in o r i ty in tru s io n s fro m th e p o o l o f a t ta c k s in to t h e n o rm a l d a t a s e t ,

we generate the minority attack dataset. We first optimize the param eters by feeding



the dataset into the NIDS. As shown in the seventh column of Table 4.1 on page 48,

the optim al number of the trees is 10. The optim al value of Mtry is 5.

Figure 4.8 plots the outlier-ness of the minority attack dataset. There are 57

intrusions in the dataset. Since the attacks are injected at the beginning of the

dataset, the figure shows the outlier-ness of some attacks is much higher than most

of normal activities. Figure 4.9 on the next page plots ROC curve to show the

relationship between the detection rates and the false positive rates over the dataset.

2 7 53 79 105 131 157 183 209 235 261 287 313 339 305 391 417 443 409


Figure 4.8: The outlier-ness of the minority a ttack dataset

The result indicates th a t the detection rate of minority intrusions is lower than

the experiments on the detection performance over different datasets in the previous

subsection. However, the result is still impressive. The detection rate can reach 65%

when false positive is 1%. In the K D D’99 contest, the detection ra te of minority

intrusion is much lower than th a t of m ajority intrusions even using misuse detection

[14].



0.9

0.8

0.7

41 0.6

0.5

0.2

False positive rate{%)

Figure 4.9: The ROC curve for the minority attack dataset

4 .2 .5 Im p lem en ta tion

We develop a Java program to implement our approaches in anomaly detection using

the W EKA (Waikato Environment for Knowledge Analysis) [5]. The W EKA is an

open source Java package which contains machine learning algorithms for d a ta mining

tasks. However, the W EKA does not implement the outlier detection function in the

random forests algorithm. Therefore, we modify the source code of the W EKA to

implement outlier detection.

4.3 Sum m ary

In th is c h a p te r , w e p ro p o s e a n e w f ra m e w o rk o f u n s u p e rv is e d a n o m a ly NIDS, based

on the outlier detection technique in the random forests algorithm. The framework



builds the patterns of network services over datasets labeled by the services. W ith

the built patterns, the framework detects attacks in the datasets using the outlier

detection algorithm.

Due to large population of datasets used in NIDSs, the process to detect outliers

is very time-consuming and costs a large am ount of memory. To improve the perfor

mance, we modify the original outlier detection algorithm to reduce the calculation

complexity, under the assum ption th a t each network service has its own pa tte rn for

normal activities.

Com pared to supervised approaches, our approach breaks the dependency on

attack-free training datasets. The experimental results over the K D D’99 dataset

confirm the effectiveness of our approach using the unsupervised detection technique.

The performance of our system is comparable to th a t of other reported unsuper

vised anomaly detection approaches. Especially, our approach achieve higher detec

tion ra te when the false positive rate is low. It is more significant for NIDSs, since

high false positive rate will make NIDSs useless.

The results also show th a t the performance tends to be reduced with increasing

number of attack connections. T hat is a problem of unsupervised systems. The

experiment for m inority intrusion indicates th a t it is more difficult to detect minority

intrusions than m ajority intrusions by the anomaly approach.


Chapter 5

Com bination of m isuse and

anom aly detection

In this chapter, we present a framework to combine the misuse and anomaly detection

described in the previous chapters. We first discuss different approaches in hybrid

systems to combine misuse and anomaly detection. Then, we describe the architecture

of the proposed hybrid system. Finally, an experiment is conducted to evaluate our

approach.

5.1 M isuse detection versus anom aly detection

Misuse detection determines intrusions by patterns or signatures which can represent

attacks. Thus, misuse-based systems can detect known attacks like virus detection

systems, bu t they cannot detect unknown attacks [33]. Most of NIDS (Network In tru

sion Detection System) products depend on misuse detection, since misuse detection

usually has higher detection rate and lower false positive rate than anomaly detection.

60


CHAPTER 5. COMBINATION OF MISUSE AND ANOMALY DETECTION 61

Another advantage of misuse detection is high detection speed due to low complexity

of detection algorithms. Anomaly detection usually has high com putational com

plexity, especially for unsupervised approaches such as clusters, outlier detection of

the random forests algorithm, and Self-Organizing Map (SOM). Therefore, misuse

detection is more suitable for on-line detection than anomaly detection.

Anomaly detection identifies observed activities th a t deviate significantly from

normal usage as intrusions. Thus, anomaly detection can detect unknown intrusions,

which cannot be addressed by misuse detection. The critical technique of anomaly

detection is to build profiles of normal usage. If the profiles are too broadly defined,

some attacks might not be detected. This leads to low detection rate. On the other

hand, if the profiles are too narrowly defined, some normal activities might be detected

as intrusions. This raises false alarms. Currently, there is no effective way to define

normal profiles th a t can achieve high detection rate and low false positive rate at the

same time. Although anomaly detection does not require prior knowledge of intrusion

and can detect new intrusions, it may not be able to describe w hat the attack is.

5.2 A pproaches to com bine m isuse and anom aly

detection

To address the problems of misuse and anomaly detection, many intrusion detection

systems combine both techniques to reach the accuracy of a misuse detection system

and have the ability to deal with new attacks. There are three ways to combine

m isu se a n d a n o m a ly d e te c tio n :



1. Anomaly detection followed by misuse detection. Suspicious activities are se

lected from observed d a ta by anomaly detection. Then, misuse detection is

used to detect intrusions from the suspicious activities.

2. Parallel approach. Misuse and anomaly detection are applied in parallel.

3. Misuse detection followed by anomaly detection. F irst, misuse detection is ap

plied. Then, anomaly detection is used to detect intrusions missed by misuse

detection.

A nom aly detection followed by m isuse detection

Figure 5.1 shows the framework of the first approach. First, observed activities are

fed into the anomaly detection component. The component produces suspicious items

th a t deviate from the built normal profile. Then, the misuse detection component

identifies intrusions from the suspicious items. The items th a t m atch patterns of

attacks are determined as known attacks. The items th a t m atch patterns of false

alarms are determined as normal activities. The others are determ ined as unknown

attacks.

AlarmsActivities Suspicious items MisuseDetection

Component

AnomalyDetection

Component

Figure 5.1: Framework of anomaly detection followed by misuse detection



ADAM [8] applies this approach in NIDS. The on-line single level and domain-level

mining module in ADAM uses anomaly detection technique to produce suspicious

items. The classifier module which uses misuse technique classifies the suspicious

items into false alarms, attacks, and unknown attacks. The approach is also used in

ADFSC (Anomaly Detection First Serial Combination) by Elvis, et al. in [33].

In this approach, the anomaly detection component should have high detection

rate, since missed intrusions cannot be detected by the follow-up misuse detection

component. The misuse detection component should be able to identify false alarms.

The false positive rate can be reduced by excluding the false alarms from the suspi

cious items.

Parallel approach

Suspicious items

Activities

Suspicious items

MisuseDetection

Component

AnomalyDetection

Component

CorrelationComponent

Figure 5.2: Framework of the parallel approach

Figure 5.2 shows the framework of the parallel approach. Observed activities are fed

to the misuse detection component and the anomaly detection component in parallel.



The two sets of suspicious items are produced by them. The correlation component

analyzes these two sets to detect intrusions. The parallel approach has been used in

NIDES (The Next Generation Intrusion Detection Expert System) [7].

M isuse detection followed by anom aly detection

Figure 5.3 shows the framework of the th ird approach. F irst, observed activities are

fed to the misuse detection component. The component detects known attacks by

m atching the signatures or patterns of attacks. O ther items (uncertain items) th a t

do not m atch any signature or pa tte rn are fed to the anomaly detection component

to detect unknown intrusions. The anomaly detection component should have low

false positive rate, otherwise the overall false positive rate of the hybrid system will

be high. High false positive rate makes the detection system useless.

Known attacks

Unknownattacks

Activities

Uncertain itemsAnomalyDetection

Comuoneirt

MisuseDetection

Component

Figure 5.3: Framework of misuse detection followed by anomaly detection

R ationale for choosing m isuse detection followed by anom aly detection

Since our anomaly approach has better performance when false positive rate is low,

the th ird approach is more suitable than the others. For the first approach, the

anomaly detection component should have very high detection rate. The low false

positive ra te is not critical. The misuse detection component needs the ability to



identify false alarms to reduce the overall false positive rate. The high complexity

of our anomaly detection does not m atch the high speed detection performance of

our misuse detection. Thus, combining both the approaches by the parallel approach

makes real tim e detection impossible.

Moreover, the experimental results in the anomaly detection show th a t the perfor

mance tends to be reduced with increasing number of attack connections. T ha t is a

problem of unsupervised systems. Some attacks such as DoS (Denial of Service) pro

duce a large number of connections, which may undermine an unsupervised anomaly

detection system. To overcome the problem, we use the th ird approach. Misuse ap

proach can detect known attacks. By removing known attacks, the number of attacks

can be reduced significantly in datasets for unsupervised anomaly detection.

Another reason to use the th ird approach is th a t misuse detection by the random

forests algorithm has high speed performance. Thus, the hybrid system can be used

to detect known intrusions in real tim e and to detect unknown intrusions in off-line

way. Low speed performance of the anomaly detection makes real tim e detection

impossible using the first and second approaches.

5.3 A rchitecture o f the hybrid system

The proposed hybrid system combines the misuse detection and the anomaly detection

in NIDS. The architecture of the system is shown in Figure 5.4 on the next page.

The system consists of two components: misuse detection component and anomaly

detection component. The misuse detection component employs the random forests

algorithm in misuse detection. The anomaly detection component implements detec

tion using outlier detection provided by the random forests algorithm.



■ M isuse Detection ComponentAttacksF e a tu n

vectors;Packdts

Network

IntrusionPatterns

On line

Of f l ineFeaturevectors

Anomaly Detection Component N ewntrusions

ServicePatterns

UnknownAttacks

Alarm s

Data Set

S en sors

Audited i data M isuse

Detector

M isuseAlarmer

OutlierDetector

AnomalyAlarmer

AnomalyD atabase

TrainingDatabase

ServicePatternBuilder

IntrusionPatternBuilder

Training data r

Off-line Preprocessor

DetectingDatabase

On-linePre

p rocessors

Figure 5.4: A rchitecture of the hybrid system



There are two phases in the framework: an off-line phase and an on-line phase.

The system can build patterns of intrusions for the misuse detection component and

can detect unknown intrusions using the anomaly detection component in the off

line phase. It detects known intrusions using the misuse detection component in the

on-line phase.

In the off-line phase, the Intrusion P a tte rn Builder module is trained from labeled

d a ta and outputs patterns of intrusions to the Misuse Detector module.

In the on-line phase, network traffic is captured and fed to Misuse Detector. Mis

use Detector raises an alarm to the Misuse Alarmer module if any connection matches

an intrusion pattern . Then, the Alarmer module will deliver alarms to security ana

lysts. If the connection does not match any intrusion pattern , it will be sent to the

Anomaly Database module which stores d a ta for the anomaly detection component.

In the off-line phase, the system can detect novel intrusions using the anomaly

detection component. F irst, the Service P a tte rn Builder module retrieves da ta from

the anomaly database to build patterns of network services, and outputs the built

patterns to the Outlier Detector module. W ith the patterns, The Outlier Detector

retrieves the da ta from the anomaly database and uses the outlier detection technique

to detect attacks. If it detects any attack, it raises alarms to the Anomaly Alarmer

module. The Anomaly Alarmer can deliver the alarms to security analysts. It also

can store the newly detected intrusions in the training database, so new patterns of

these intrusions can be built for misuse detection.




5.4 .1 D a ta se t and prep rocessin g

We evaluate our hybrid approach over the KD D’99 dataset. The full training set

of the KDD’99 is labeled by types of intrusions. To calculate the detection rate

and false positive ra te easily, we change the label to 1 or 0 (1 if connection is an

intrusion; 0 otherwise) instead of the types. Then, we choose five most popular

network services: ftp, h ttp , pop, smtp, and telnet. By selecting ftp, h ttp , pop, smtp,

and telnet connections, we generate our training set which contains 16,919 connections

w ith down-sampling normal connections. The test set of the K D D ’99 is also processed

as the above. We generate our test set which contains 49,838 connections w ithout

down-sampling normal connections. The training set is used to build patterns of

intrusions for misuse detection. The test set is used to evaluate our hybrid approach.

5 .4 .2 E valu ation and d iscu ssion

To improve the performance of misuse detection, we employ the feature selection

algorithm to calculate the value of variable im portance over the training set. The

result is shown as Figure 5.5 on the next page.

The figure shows th a t the variable im portance of the last 7 features (Feature 2, 7,

8, 9, 15, 20, and 21) are much smaller than others. Therefore, we select the 34 most

im portant features to build patterns of intrusions.



Importance-10 10

CD

CO<D

LL

5 1037

13

346

3340134

293824 32 22 23 26 28 3639 3117 1235411425 27 16301918 111520 9 8 7 2

21

Figure 5.5: Variable im portance of the features in the hybrid approach experiment

We also optimize the param eters (M try and the number of trees) of the random

forests algorithm for the misuse detection. By building patterns of intrusions with

different M try (5, 10, 15, 20, 25, 30, and 34) and different number of trees (10, 15, 20,

25, 30, 35, 40, 45, and 50), we get the oob error rates for each pattern as Tabic 5.1

on the next page.



Table 5.1: The oob error rates for param eter optim ization in the hybrid approach experiment

Trees M try Oob error rate of intrusion patterns

Oob error rate of service patterns

10 5 0.00227 0.0021210 10 0.00199 0.0014610 15 0.00201 0.0009310 20 0.00173 0.0007510 25 0.00203 0.0009110 30 0.00184 0.0008410 34 0.00175 0.0006915 5 0.00249 0.0021615 10 0.00211 0.0014215 15 0.002 0.0010615 20 0.00191 0.0007515 25 0.00197 0.0008815 30 0.0018 0.0008115 34 0.00169 0.000720 5 0.00246 0.0021620 10 0.00212 0.0014320 15 0.00209 0.0010920 20 0.00186 0.0007720 25 0.00186 0.0008620 30 0.00177 0.0008220 34 0.00169 0.0007125 5 0.0024 0.0021225 10 0.00226 0.0014125 15 0.00213 0.0010325 20 0.002 0.0007625 25 0.00183 0.0008825 30 0.00172 0.0007925 34 0.00169 0.0007130 5 0.0026 0.0021130 10 0.00228 0.0014630 15 0.00216 0.0010130 20 0.00195 0.0007830 25 0.00176 0.0008330 30 0.00173 0.00076



Table 5.1 : The oob error rates for param eter optim ization in the hybrid approach experiment (continued)

Trees M try Oob error rate of intrusion patterns

Oob error rate of service patterns

30 34 0.00171 0.0006935 5 0.00262 0.002135 10 0.00224 0.0014635 15 0.00219 0.0010335 20 0.00201 0.0007935 25 0.00185 0.0007835 30 0.0018 0.0007435 34 0.00178 0.0006640 5 0.00274 0.0021140 10 0.00231 0.0014540 15 0.00217 0.0010140 20 0.00203 0.000840 25 0.00188 0.0007740 30 0.00186 0.0007540 34 0.0018 0.0006845 5 0.00278 0.0020645 10 0.00227 0.0014445 15 0.00218 0.00145 20 0.00203 0.0007945 25 0.0019 0.0007945 30 0.00189 0.0007545 34 0.00182 0.0006750 5 0.00283 0.0020650 10 0.00232 0.0014350 15 0.00224 0.0009950 20 0.00211 0.000850 25 0.00195 0.0007950 30 0.0019 0.0007450 34 0.00187 0.00066

In misuse detection, w ith the optimized param eters (the number of trees is 15,

the value of M try is 34), we build the patterns of intrusions. W ith the built patterns,



we use the misuse approach to detection intrusions over the test set. The detection

rate is 94.2%, and the false positive rate is 1.1%.

By excluding the detected intrusions from the test set, we generate the anomaly

test set. The anomaly test set is labeled by the services. We optimize the param eters

for the anomaly detection over the anomaly test set. By building patterns of the

services w ith different M try (5, 10, 15, 20, 25, 30, and 34) and different number of

trees (10, 15, 20, 25, 30, 35, 40, 45, and 50), we get the oob error rates as Table 5.1

on page 70.

In anomaly detection, with the optimized param eters (the number of tree is 35, the

value of M try is 34), we build the patterns of the services. W ith the built patterns, we

use the outlier detection approach to calculate the outlier-ness of the connections in

the anomaly test set. The connections are sorted by outlier-ness in descending order.

The first one percent of the connections is determined as intrusions. We choose one

percent as the threshold, so the false positive rate of the anomaly detection will

be below one percent. 30 connections are identified as new intrusions. The overall

detection rate of the hybrid system is 94-7%. The overall false positive rate is 2%.

The result shows th a t the anomaly approach detects some intrusions missed by the

misuse approach. However, most of them are also missed by the anomaly approach.

We initiate further analysis on the result. Figure 5.6 on the next page plots the

outlier-ness. There are 411 attacks in the anomaly set. The attacks are injected at

the beginning of the dataset.



eq4l ' 7 6^r 830 i f 9121*•21 2281 3041 3801 •321

C onnections

Figure 5.6: Outlier-ness of the anomaly test set

As shown in the figure, some intrusive connections have much higher outlier-ness

than the normal connections, bu t most of them have much lower outlier-ness than

the normal ones. The explanation is th a t these intrusions are very similar to each

other. They very likely fall into the same leaf of a tree built by the random forests

algorithm, so they have very low outlier-ness.

5.4 .3 Im p lem en ta tion

In this experiment, we use the FORTRAN 77 program [2] developed by Leo Breiman

and Adele Cutler to calculate the variable im portance in feature selection. We build

a tool to optimize the param eters of the random forests algorithm. We develop a Java

program to implement our approach in hybrid detection using the W EKA (Waikato

Environm ent for Knowledge Analysis) [5].



5.5 Sum m ary

In this chapter, we propose a new framework of a hybrid system to combine our

misuse detection and anomaly detection approaches. In the framework, the misuse

detection is applied first to detect known intrusions in real time. The connections

th a t are not determined as intrusions by the misuse detection are detected by the

off-line anomaly detection approach.

The misuse detection has high detection rate for known intrusions and has low

false positive rate. The anomaly detection using the outlier detection can detect

novel intrusions and has good detection performance when the false positive rate

is low. The proposed hybrid system combines the advantages of these two detection

approaches. Besides, the misuse detection can remove known intrusions from datasets,

so the performance of the anomaly detection can be improved by applying the misuse

detection first.

The experimental results show th a t the proposed hybrid approach can achieve high

detection rate with low false positive rate, and can detect novel intrusions. However,

some intrusions th a t are very similar with each other cannot be detected by the

anomaly detection. T h a t is a lim itation of the outlier detection provided by random

forests.


Chapter 6

Conclusion and future work

6.1 C onclusion

In this thesis, we outline our da ta mining-based approaches for network intrusion

detection. We apply the random forests algorithm in misuse detection, anomaly

detection, and hybrid detection.

To address the problems of rule-based systems, we employ random forests to build

patterns of intrusions. By learning over training data, the random forests algorithm

can build the patterns autom atically instead of coding rules manually. In our misuse

detection framework, patterns of intrusions are built in off-line phase, and can be

deployed automatically. The system can detect intrusions in real tim e with the built

patterns. Detection speed is critical for real tim e NIDSs. Our experiment on speed

measurement shows th a t the system has high speed performance. Thus, the system

can be used in real tim e network environments. To improve accuracy of the system,

we use the feature selection algorithm and optimize the param eters of the random

forests algorithm. We also use sampling techniques to increase the detection rate of

75


CHAPTER 6. CONCLUSION AND FUTURE WORK 76

m inority intrusions in the framework.

We evaluate the misuse approach over the KDD’99 dataset. The experimental

result shows th a t the performance provided by our approach is be tter than the best

KD D’99 result.

Misuse detection cannot detect novel intrusions, so we propose a new approach in

unsupervised anomaly detection. We apply the outlier detection of the random forests

algorithm in anomaly detection. The outliers detected by the random forests algo

rithm are determined as intrusions. Since the random forests algorithm is supervised

d a ta mining algorithm, it uses labeled training da ta to build patterns. Therefore, our

approach builds patterns of network services instead of intrusions. W ith the built

patterns of services, the approach determines the outliers related to the patterns as

intrusions. The approach breaks the dependency on attack-free training data, which

is the m ajor problem of supervised anomaly detection. Detecting outliers in large

datasets is time-consuming and costs a large am ount of memory. To improve the

performance of detecting outliers, we modify the original outlier detection algorithm

to reduce the complexity of calculation.

We evaluate the anomaly approach over the different datasets generated from the

K D D’99 dataset. The results confirm th a t our approach achieves higher detection

rate when the false positive rate is low, compared to other reported unsupervised

anomaly detection approaches. The results also show th a t the detection performance

tends to decrease w ith increasing the number of attack connections in a dataset.

Misuse detection has high detection rate with low false positive rate. Anomaly

detection can detect novel intrusions. Therefore, combining misuse and anomaly

detection can improve the overall performance of intrusion detection systems. We



propose a new framework to combine the misuse and anomaly detection in which

we apply the random forests algorithm. In the framework, the misuse detection is

applied first to detect known intrusions. By filtering out the intrusions detected by

the misuse detection approach, the number of intrusions in the dataset can be reduced

significantly. Hence, the detection performance of the anomaly approach is improved.

The evaluation experiment on our hybrid approach indicates th a t the proposed hy

brid framework can achieve high detection ra te with low false positive rate compared

with other hybrid systems. Some intrusions are very similar with each other. The

results also show th a t the outlier detection cannot detect these kinds of intrusions.

6.2 L im itations and future work

We apply the outlier detection of the random forests algorithm in anomaly detection.

The technique has two limitations. First, the intrusions in a dataset need to be much

less than normal data. The outlier detection only works when the m ajority of da ta are

normal. We could use the misuse detection to filter out known intrusions. However,

this cannot guarantee th a t the m ajority of activities are normal after removing known

intrusions. For example, a new type of intrusion may produce large number of con

nections, which cannot be filtered out by the misuse detection. This could decrease

the performance of the anomaly detection. Moreover, it may undermine the hybrid

system. Second, some intrusions with high degree of sim ilarity cannot be detected

as outliers by the anomaly detection. To solve the above problems, we suggest th a t

other da ta mining algorithms such as clustering algorithm could be investigated in

the future.

The random forests algorithm has been successfully applied in different fields,



especially for prediction. It can find patterns th a t are suitable for prediction in large

volumes of data. Basically, the techniques used in intrusion detection can be used

in intrusion prediction. The inputs for prediction are past da ta and the outputs for

prediction are future data.

In intrusion prediction, we can predict a specific intrusion based on symptoms.

There are some symptoms (predictor) for some intrusions. For example, IP scan

activity is a predictor of the worm propagation. Therefore, we can analyze datasets

and intrusions to predict a certain intrusion using the random forests algorithm. The

plans for this kind of intrusion prediction are listed as follows:

• Find predictable intrusions by analyzing datasets and intrusions.

• E xtract the features which can be predictors of intrusions.

• Apply the random forests algorithm effectively to build the prediction patterns.

We also can predict an intrusion in a more general way. For example, we can pre

dict whether intrusions will happen within a certain period. The currently available

datasets are not suitable for this kind of prediction. We need to find some other pre

dictors which are correlated with intrusions, such as the number of the vulnerabilities

on a network. The plans for this kind of intrusion prediction are listed as follows:

• Find predictors related to intrusions.

• Collect the da ta th a t contains the predictors of intrusions.

• A p p ly t h e r a n d o m fo re s ts a lg o r i th m e ffec tiv e ly to b u ild th e p r e d ic t io n p a t te r n s .

Since this kind of d a ta is difficult to collect, there will be some missing values



in the datasets. We need to use the random forests algorithm to handle the

missing value problem.


Bibliography

[1] Com puter Security Institu te, 2004 C SI/FB I Com puter Crime and Security Sur

vey. h ttp://w w w .issa-sac.org/docs/FB I2004.pdf, San Francisco, USA. (Accessed

in November 2005).

[2] Leo Breiman and Adele Cutler, random forests. h t tp ://s ta t-

www.berkeley. edu/users/breim an/R andom Forests/cc Jiom e. h tm , U niversity

of California, Berkeley, CA, USA. (Accessed in November 2005).

[3] M IT Lincoln Laboratory, DARPA Intrusion Detection Evaluation.

h ttp ://w w w .ll.m it.edu /IS T /ideva l/, MA, USA. (Accessed in November

2005).

[4] Snort, Network Intrusion Detection System, h ttp ://w w w .snort.o rg . (Accessed in

November 2005).

[5] W EKA software, Machine Learning, h ttp ://w w w .cs.w aikato .ac.nz/m l/w eka/,

The University of Waikato, Hamilton, New Zealand. (Accessed in November

2005).

80


http://www.issa-sac.org/docs/FBI2004.pdf

http://stat-

http://www.berkeley

http://www.ll.mit.edu/IST/ideval/

http://www.snort.org

http://www.cs.waikato.ac.nz/ml/weka/

BIBLIOGRAPHY 81

[6] Tamas Abraham. IDDM: Intrusion Detection Using D ata Mining Techniques.

In D STO Electronics and Surveillance Research Laboratory, Salisbury, Australia,

May 2001.

[7] Debra Anderson, Thane Frivold, and Alfonso Valdes. Next-Generation Intrusion

Detection Expert System (NID ES) - A Summary. Technical Report SRI-CSL-95-

07, SRI, USA, May 1995.

[8] Daniel Barbara, Julia Couto, Sushil Jajodia, Leonard Popyack, and Ningning

Wu. ADAM: Detecting Intrusions by D ata Mining. In Proceedings o f the 2001

IEEE, Workshop on Information Assurance and Security, T1A3 1100 United

States M ilitary Academy, West Point, NY, USA, June 2001.

[9] Daniel B arbara and Sushil Jajodia, editors. Applications of Data Mining in

Computer Security. Kluwer Academic Publishers, 2002.

[10] V. B arnett and T. Lew. Outliers in Statistical Data. John Wiley, 1994.

[11] Alan Bivens, Mark Embrechts, Chandrika Palagiri, Rasheda Smith, and Boleslaw

Szymanski. Network-Based Intrusion Detection Using Neural Networks. In A r

tificial Neural Networks In Engineering, St. Louis, Missouri, USA, November

2002 .

[12] L. Breiman. Random Forests. In Machine Learning, 45(1):532, 2001.

[13] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification.

Wiley Interscience, 2000.

[14] Charles Elkan. Results of the KD D’99 Classifier Learning. In SIGKD D Explo

rations, 1(2): 63-64, 2000.


BIBLIOGRAPHY 82

[15] Carl Endorf, Eugene Schultz, and Jim Mellander. Intrusion Detection & Pre

vention. M cGraw-Hill/Osborne, 2004.

[16] E. Eskin, A. Arnold, M. Prerau, L. Portnoy, and S. Stolfo. A Geometric Frame

work for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled

Data. In Applications of Data Mining in Computer Security, Kluwer, 2002.

[17] Lan Guo, Yan Ma, Bojan Cukic, and Harshinder Singh. Robust Prediction of

Fault-Proneness by Random Forests. In Proceedings o f the 15th International

Symposium on Software Reliability Engineering (IS S R E ’04), pp. 417-428, B rit

tany, France, November 2004.

[18] David J. Hand, Heikki Mannila, and Padhraic Smyth. Principles o f Data Mining.

The M IT Press, August 2001.

[19] Wenke Lee and Salvatore J. Stolfo. A Framework for Constructing Features and

Models for Intrusion Detection Systems. In AC M Transactions on Information

and System Security (TISSEC ), Volume 3, Issue 4, November 2000.

[20] Wenke Lee and Salvatore J. Stolfo. D ata Mining Approaches for Intrusion Detec

tion. In the 7th USENIX Security Symposium, San Antonio, TX, USA, January

1998.

[21] Kingsly Leung and Christopher Leckie. Unsupervised Anomaly Detection in

Network Intrusion Detection Using Clusters. In Australasian Computer Science

Conference, Newcastle, NSW, Australia, 2005.


BIBLIOGRAPHY 83

[22] Richard P. Lippmann and Robert K. Cunningham. Improving Intrusion Detec

tion Performance Using Keyword Selection and Neural Networks. In Computer

Networks, 34:597-603, 2000.

[23] C. T. Lu, D. Chen, and Y. Kou. Algorithms for Spatial Outlier Detection. In

Proceedings o f 3rd IE E E International Conference on Data Mining, Melbourne,

Florida, USA, November 2003.

[24] Jianxiong Luo and Susan M. Bridges. Mining Fuzzy Association Rules and

Fuzzy Frequency Episodes for Intrusion Detection. In International Journal of

Intelligent Systems, Vol. 15, No. 8, pp. 687-704, 2000.

[25] M. Mahoney and P. Chan. An Analysis of the 1999 DARPA/Lincoln Laboratory

Evaluation D ata for Network Anomaly Detection. In Proceeding of Recent Ad

vances in Intrusion Detection (RAID)-2003, P ittsburgh, USA, September 2003.

[26] Susan M.Bridges and Rayford B. Vaughn. Fuzzy D ata Mining and Genetic

Algorithms Applied to Intrusion Detection. In Proceedings o f the National In

form ation System s Security Conference (NISSC), Baltimore, MD, USA, October

2000 .

[27] S. M ukkamala and A. H. Sung. Learning machines for Intrusion Detection:

Support Vector Machines and Neural Networks. In International Conference on

Security and M anagement, Las Vegas, USA, 2002.

[28] Bogdan E. Popescu and Jerome H. Friedman. Ensemble Learning fo r Prediction.

Doctoral Thesis, Stanford University, USA, January 2004.


BIBLIOGRAPHY 84

[29] M. Ram adas, S. Osterm ann, and B. Tjaden. Detecting Anomalous Network Traf

fic with Self-Organizing Maps. In Proceedings o f Recent Advances in Intrusion

Detection (RAID ), P ittsburgh, USA, September 2003.

[30] Sara M atzner Chris Sinclair and Lyn Pierce. An Application of Machine Learning

to Network Intrusion Detection. In Proceedings o f the 15th Annual Computer

Security Applications Conference, pp. 371377, Phoenix, AZ, USA, 1999.

[31] Rasheda Smith, Alan Bivens, Mark Embrechts, Chandrika Palagiri, and Boleslaw

Szymanski. Clustering Approaches for Anomaly Based Intrusion Detection. In

Walter Lincoln Hawkins Graduate Research Conference 2002 Proceedings, New

York, USA, October 2002.

[32] K. Tan, K. Killourhy, and R. Maxion. Undermining an anomaly based intrusion

detection system using common exploits. In Proceeding o f Recent Advances in

Intrusion Detection (RAID ), Zurich, Switzerland, October 2002.

[33] Elvis Tombini, Herve Debar, Ludovic Me, and Mireille Ducasse. A Serial Combi

nation of Anomaly and Misuse IDSes Applied to H T TP Traffic. In 20th Annual

Computer Security Applications Conference, Tucson, AZ, USA, December 2004.

[34] Q.A. Tran, H. Duan, and X. Li. One-class Support Vector Machine for Anomaly

Network Traffic Detection. In The 2nd Network Research Workshop of the 18th

A PAN, Cairns, Australia, 2004.

[35] Ting-Fan Wu, Chih-Jen Lin, and Ruby C. Weng. Probability Estim ates for M ulti

class Classification by Pairwise Coupling. In The Journal o f Machine Learning

Research, Volume 5, December 2004.


BIBLIOGRAPHY 85

[36] Yimin Wu. High-dimensional Pattern Analysis in Multimedia Information Re

trieval and Bioinformati. PhD thesis, State University of New York, USA, Jan

uary 2004.

[37] J. Zhang and M. Zulkernine. Network Intrusion Detection Using Random Forests.

In Proc. o f the Third Annual Conference on Privacy, Security and Trust, pages

53-61, St. Andrews, New Brunswick, Canada, October 2005.

[38] J. Zhang and M. Zulkernine. A Hybrid Network Intrusion Detection Technique

Using Random Forests. In the First International Conference on Availability,

Reliability and Security, submitted, Vienna University of Technology, Austria,

April 2006.

[39] J. Zhang and M. Zulkernine. Anomaly Based Network Intrusion Detection with

Unsupervised Outlier Detection. In the 2006 IE E E International Conference on

Communications, submitted, Istanbul, Turkey, June 2006.


network intrusion detection systems using random …rozup.ir/download/2420063/pa_4.pdf ·...

Documents