detecting network intrusion based on data mining ...€¦ · detecting network intrusion based on...

DETECTING NETWORK INTRUSION BASED ON DATA MINING TECHNIQUES AND ITS APPLICATION

FOR MEDICAL SENSOR NETWORK

Thesis submitted in

Partial Fulfillment for the award of

Degree of Doctor of Philosophy in

Computer Science and Engineering

By

G. KARTHIK

FACULTY OF ENGINEERING AND TECHNOLOGY

VINAYAKA MISSIONS UNIVERSITY

(VINAYAKA MISSIONS RESEARCH FOUNDATION – DEEMED TO BE UNIVERSITY) SALEM, TAMILNADU, INDIA

NOVEMBER 2016


SALEM

DECLARATION

I, G. Karthik, declare that the thesis entitled DETECTING NETWORK

INTRUSION BASED ON DATA MINING TECHNIQUES AND ITS

APPLICATION FOR MEDICAL SENSOR NETWORK submitted by me

for the Degree of Doctor of Philosophy is the record of work carried out

by me during the period from 2009 to 2016 under the guidance of

Dr. A. Nagappan, and has not formed the basis for the award of any

degree, diploma, associate-ship, fellowship, titles in this or any other

University or other similar institutions of higher learning.

Place: Salem

Date: Signature of the Candidate


SALEM

CERTIFICATE BY THE GUIDE

I, Dr. A. Nagappan, certify that the thesis entitled DETECTING

NETWORK INTRUSION BASED ON DATA MINING TECHNIQUES

AND ITS APPLICATION FOR MEDICAL SENSOR NETWORK

submitted for the Degree of Doctor of Philosophy by Mr. G. Karthik, is

the record of research work carried out by him during the period from

2009 to 2016 under my guidance and supervision and that this work

has not formed the basis for the award of any degree, diploma,

associate-ship, fellowship or other titles in this University or any other

University or Institution of higher learning.

Place: Salem

Date: Signature of the Guide

ACKNOWLEDGEMENT

Let me thank God almighty who has been showering His blessings on

me all these days.

I express my gratitude to our Honorable Founder Chancellor, Vinayaka

Missions University, Dr. A. Shanamugasundaram, Madam Founder

Chancellor Mrs. Annapoorani Shanamugasundaram, Chancellor

Dr. A.S. Ganesan and Pro-Chancellor Dato Sri’ Dr. S. Sharavanan,

for permitting me to do this research at VMKV Engineering College.

Firstly, I would like to give thanks to my Supervisor, Dr. A. Nagappan,

Principal, V.M.K.V. Engineering College for this unstinted support and

guidance. I learned a lot from our discussion and his positive attitude

and guidance to motivate me to work hard. One of his suggestions that

I will always remember is “learn from comments and improve your

work.” This simple suggestion is applicable not only to research but

also in other aspects of my life.

My Special thanks to our Vice-Presidents Mr.J.S.Sathishkumar and

Mr.N.V.Chandrasekar, Mr.N.Ramsamy, Director, Mr.K.Jaganathan,

Director, Porf.Dr. V.R.R. Rajendran, Vice Chancellor,Dr. Y. Abraham,

Registrar and Dr. K. Rajendran, Dean (Research), of Vinayaka

Missions University, Salem, and to my colleagues and friends who

have helped me in one way or other in doing this research.Last but not

the least; I thank my parents,wife and my relations who were

supporting me day in and day out during the course of my research.

(G. KARTHIK)

ABSTRACT

Intrusion detection is the challenge to monitor and probably prevent the

attempts to intrude into or otherwise compromise your system and

network resources. One of the recent methods for identifying any

abnormal activities staging in a computer system is carried out by

Intrusion Detection Systems (IDS) and it forms a major portion of

system defence against attacks. The main objectives of this thesis are

to study and analyse different variants of intrusion detection techniques

meant for improving performance and also to design and develop an

efficient approach for Intrusion Detection using Clustering and Hybrid

techniques. The proposed approach is applied on KDD cup-99 dataset

and to evaluate the result to attain high accuracy. In this work some

existing clustering techniques such as K Means Clustering, Fuzzy K

Means Clustering, Fuzzy C-Means and KFCM are discussed and

implemented. To evaluate the performance of the proposed technique,

I used KDD CUP 99 DATASET for testing and evaluation. Based on

the analysis it is observed that the proposed Fuzzy Bisector-Kernel

Fuzzy C-means clustering (FB-KFCM) performs better than other

methods in terms of accuracy which attains an average high accuracy

of 93.91% when compared with other techniques. A Hybrid Intrusion

Detection System using LDA+CS (Linear Discriminant Analysis +

Cuckoo search) is developed by combining LDA and CS. LDA is a

iii

commonly used technique for dimensionality reduction. Fuzzy Bisector-

Kernel Fuzzy C-means clustering (FB-KFCM) is used as the clustering

technique and in this proposed system; the Bayesian Neural Network

is used for better classification. In this work Comparison of the existing

technique such as KFCM + Bayesian network and Fuzzy Bisector-

Kernel Fuzzy C-means clustering (FB-KFCM) + Bayesian network are

compared along with the hybrid technique LDA+CS + FB-KFCM +

Bayesian Network introduced as well as their results are discussed. To

evaluate the performance of the proposed technique, I used KDD CUP

99 DATASET for testing and evaluation. Based on the comparative

analysis the proposed hybrid technique LDA+CS + FB-KFCM +

Bayesian Network attained high accuracy of 98.31%. These values

show the efficiency of the proposed technique by achieving better

accuracy values. Finally, the proposed algorithm is stimulated using

medical sensor networks that consist of totally 8668 data. The

simulation result is obtained for 10 test data. From the result, it is found

that 8 among 10 are not intruded and remaining 2 are intruded. This

attains high accuracy rate and efficiency of the technique introduced

here.

iv

TABLE OF CONTENTS

Chapter No. Title Page

No. ABSTRACT iii

LIST OF TABLES x

LIST OF FIGURES xi

LIST OF SYMBOLS AND ABBREVIATIONS xiii

1 INTRODUCTION 1

1.1 Motivation 1

1.2 Intrusion Detection System 2

1.2.1 Attack motivation and objectives 7

1.2.2 Types of Intrusion Attack 7

1.2.2.1 DOS Attack 8

1.2.2.2 Probe Attack 9

1.2.2.3 U2R 9

1.2.2.4 R2L 10

1.2.3 Details of some Common Attacks 11

1.3 Why we need IDS? 16

1 .3.1 Efficiency of Intrusion Detection Systems 17

1.4 Data mining 19

1.4.1 Data mining Life Cycle 20

1.4.1.1 Define the problem 20

1.4.1.2 Data collection and selection 21

v

1.4.1.3 Data Preprocessing 22

1.5 Types of Databases 22

1.6 Data Mining Applications 25

1.7 Data Mining in Medical Data 28

1.7.1 Problems in Medical Data 29

1.8 Application to Medical Sensor Network 30

1.9 Objectives of the Thesis 32

1.10 Scope of the Thesis 32

1.11 Organization of the Thesis 33

1.12 Summary 36

2 LITERATURE REVIEW 37

2.1 Intrusion Detection System (IDS) 37

2.1.1 Confidentiality 38

2.1.2 Integrity 38

2.1.3 Availability 39

2.2 Classification of intrusion detection systems 40

2.2.1 Intrusion Detection Approach 41

2.2.1.1 Anomaly-Based Detection 42

2.2.1.2 Signature-Based Detection 43

2.2.2 Types of Protected Systems 43

2.2.2.1 Host Based Intrusion Detection 43

2.2.2.2 Network Based Intrusion Detection 48

2.2.2.3 Hybrid Based Intrusion Detection 61

vi

2.3 Structure of IDS 62

2.3.1 Data Source 62

2.3.2 Behavior of an attacker 63

2.3.3 Analysis Timing 64

2.3.3.1 Audit Trail Processing 64

2.3.3.2 On-Fly Processing 66

2.4 IDS Data Processing Techniques 67

2.4.1 Expert systems 67

2.4.2 Signature analysis 67

2.4.3 Colored Petri Nets 68

2.4.4 State-Transition Analysis 68

2.4.5 Statistical Analysis Approach 69

2.4.6 Neural Networks 69

2.4.7 User Intention Identification 70

2.4.8 Computer Immunology 71

2.5 Data mining Theoretical background 71

2.5.1. Data mining and Knowledge discovery 75

2.5.2. History of data mining 78

2.5.3. Data mining functionality 81

2.6 Evaluation of Datasets 88

2.7 Feature Selection 94

2.8 Summary 101

vii

3 METHODOLOGY& DATABASE 102

3.1 The DARPA Intrusion-Detection Evaluation Program 102

3.2 Attack Types in the 1999 DARPA Data Set 104

3.2.1 Different Attack Types 105

3.2.2 Attack Descriptions 107

3.3 Data-Set Description 110

3.3.1 Set of Features used in the Connection Records 111

3.4 Feature Extractions and Preprocessing 118

3.4.1 Normalization 119

3.5 Performance Evaluation Metrics 120

3.6 Summary 122

4 CLUSTERING BASED INTRUSION DETECTION 123

4.1 Introduction 123

4.2 Need for Clustering of data 123

4.3 Clustering Algorithms 124

4.3.1 K Means Clustering 125

4.3.2 Fuzzy K Means Clustering 127

4.3.3 Fuzzy C-Means 130

4.3.4 KFCM 131

4.3.5 Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM) 133

4.4 Classification Module 138

4.4.1 Neural Network 138

4.4.2 Bayesian Neural Network 140

viii

4.5 Results and Discussions 142

4.6 Summary 150

5 HYBRID INTRUSION DETECTION SYSTEM 151

5.1 Introduction 151

5.2 Need for Hybrid Approach 152

5.3 Application of Hybrid Approach 154

5.4 Locality Preserving Cuckoo search Algorithm 155

5.4.1 Training Phase 157

5.4.1.1 Initialization 158

5.4.1.2 Fitness Calculation and Nest update 159

5.5 Clustering using FB-KFCM 164

5.6 Classification using Bayesian Neural Network 167

5.7 Summary 171

6 RESULTS AND IMPLEMENTATION 173

6.1 Comparative Analysis 180

6.2 Implementation in Medical Sensor Network 181

6.3 Summary 183

7 CONCLUSION 185

7.1 Contributions 186

7.2 Future Works 188

REFERENCES 189

LIST OF PUBLICATIONS 206

ix

LIST OF TABLES

Table No. Title Page No.

Table 3.1 Class Labels that Appears in Full KDDCUP99 Dataset

108

Table 3.2 Class Labels that Appears in 10% KDDCUP99 Dataset

109

Table 3.3 KDDCUP99 Basic Features of Individual TCP Connections

110

Table 3.4 Content Features within a Connection Suggested by Domain Knowledge

110

Table 3.5 Traffic Features Computed Using a Two-second Time Window

111

Table 3.6 Traffic Features computed using a Hundred- second connection windows

112

Table 4.1 Accuracy table for Case 8:2 137



Table 4.4 Average Accuracy Table 140

Table 4.5 Comparative Analysis 143

Table 6.1 Attack Distribution in KDD full, KDD 10% and KDD Corrected dataset.

166

Table 6.2 Accuracy for 8:2 167



Table 6.5 Average Accuracy Table 170

x

LIST OF FIGURES

Figure No. Title Page No.

Figure 1.1 Simple Intrusion Detection System 3

Figure 1.2 Types of intrusion attack 8

Figure 1.3

Figure 1.4

Data mining life cycle Medical Sensor Network architecture

21

31

Figure 2.1 Intrusion Detection System Classification and Processing

38

Figure 2.2 Behavior of the user in the system 59

Figure 2.3 KDD process model 72

Figure 2.4 Data Mining and Associated Fields 73

Figure 2.5 Data mining functionalities 77

Figure 2.6 Classification using decision tree 80

Figure 2.7 Clustering 81

Figure 2.8 Outlier Analysis 82

Figure 4.1 Input mono dimensional data 122

Figure 4.2 Clustered using k means 123

Figure 4.3 Clustered Using Fuzzy K Means 123

Figure 4.4 Illustration of FB-KFCM clustering technique 128

Figure 4.5 Block diagram of the Neural Network 133

Figure 4.6 Accuracy Plot for Case 8:2 139



xi

Figure 4.9 Average Accuracy Plot 141

Figure 4.10 Accuracy plot for Comparative Analysis 143

Figure 5.1 Proposed Intrusion Detection System 147

Figure 5.2 Fixed Nest 149

Figure 5.3 Nest formation from original dataset 150

Figure 5.4 LDA-CS Flow Diagram 154

Figure 5.5 FB-KFCM 158

Figure 5.6 Bayesian Neural Network Classifier(BNNC) 160




Figure 6.4 Average Accuracy Plot 170

Figure 6.5 Simulation Result Obtained for time T1, T2, T3,T4 173

xii

LIST OF SYMBOLS AND ABBREVATIONS

ADC Approximate Distance Clustering

AFRL Air Force’s Research Laboratory

ARIS Attack Registry and Intelligence Service

BN Bayesian Network

BNNC Bayesian Neural Network Classifier

C.I.A Confidentiality Integrity and Availability

CID Consensus Intrusion Database

CS Cuckoo Search

DAG Directed Acyclic Graph

DARPA Defence Advanced Research Projects Agency

DB Distance Based

DCost Damage Cost

DIDS Distributed Intrusion Detection System

DL Description Length

DLCF Dynamic Learning Classifier Framework

DM Data Mining

DOS Denial of Service

DR Detection Rate

e-kNN Extension to k-Nearest Neighbour

FAR Failure Analysis Rate

FB-KFCM Fuzzy Bisector-Kernel Fuzzy C-means clustering

xiii

FNR False Positive Rate

FP False Positive

GrIDS Graph based IDS

HIDS Host based Intrusion Detection System

HMM Hidden Markov Model

HYBRID IDS Hybrid Intrusion Detection System

ID Intrusion Detection

ID3 Induction Decision version 3

IDES Intrusion Detection Expert System

IDIOT Intrusion Detection In Our Time

IDS Intrusion Detection System

IDT Induction Decision Tree

IES Information Exploration Shootout

ISC Internet Storm Centre

ISOA Information Security Officer’s Assistant

ISS Internet Security Systems

KDD Knowledge Discovery in Databases

KDDCUP'99 Knowledge Discovery in Databases Dataset 1999

KFCM Kernel Fuzzy C-means clustering

kNN k-Nearest Neighbour

kRD k-Relative Distance

LDA Linear Discriminant Analysis

xiv

LVQ Learning vector quantization

MFDO Multistage Framework to Detect Outliers

ML Machine Learning

MSE Mean Square Error

NFR Network Flight Recorder

NID Network Intrusion Detection

NIDES Network Intrusion Detection Expert System

NIDS Network based Intrusion Detection System

NRBC New Rule Based Classification

PC Probabilistic Cardinality

R2L Remote to Local

RCost Response Cost

ROC Receiver Operating Characteristic

RS Rule set

SCACC Storm Centre Analysis and Coordination Centre

SRSWR Simple Random Sample Without Replacement

TCP Transmission Control Protocol

TN True Negative

TP True Positive

TPR True Positive Rate

U2R User to Root

XML Extended Mark-up Language

xv

CHAPTER 1

INTRODUCTION

1.1 Motivation

Due to the popularization of the Internet and local networks,

intrusion events to computer systems are growing [150]. Because of

increased network connectivity, computer systems are becoming

increasingly vulnerable to attack. The general goal of such attacks is to

subvert the traditional security mechanisms on the systems and

execute operations in excess of the intruder's authorization. These

operations could include reading protected or private data or simply

doing malicious damage to the system or user files [110]. By building

complex tools, which continually monitor and report activities, a system

security operator can catch potentially malicious activities as they

occur. Intrusion detection systems are becoming increasingly important

in maintaining proper network security [5, 29 and 150].

A good intrusion detection system should be able to distinguish

between normal and abnormal user activities. This would include any

event, state, content, or behaviour that is considered to be abnormal by

a pre-defined standard [52]. It is very important for IDSs to generate

rules to distinguish normal behaviours from abnormal behaviour by

1

observing dataset, which is the record of activities generated by the

operating system that are logged to a file in chronological order [46].

Intrusion detection has received a lot of interest among the

researchers due to the rapid development and popularization of the

Internet and local networks. A good intrusion detection system should

be able to differentiate between normal and abnormal user activities. It

is very important to generate rules to distinguish normal behaviours

from abnormal behaviour. Though lot of techniques and tools are

available, more research is needed to develop good system for

intrusion detections.

1.2 Intrusion Detection System

An intrusion detection system acquires information about an

information system to perform a diagnosis on the security status of the

later. The goal is to discover breaches of security, attempted breaches,

or open vulnerabilities that could lead to potential breaches. A typical

intrusion detection system is shown in Figure 1.1.

An intrusion-detection system can be described at a very

macroscopic level as a detector that processes information coming

from the system to be protected. This detector can also launch probes

to trigger the audit process, such as requesting version numbers for

2

applications. It uses three kinds of information: long-term information

related to the technique used to detect intrusions (a knowledge base of

attacks, for example), configuration information about the current state

of the system, and audit information describing the events that are

happening on the system.

Figure 1.1 Simple Intrusion Detection Systems

The role of the detector is to eliminate unneeded information

from the audit trail. It then presents either a synthetic view of the

security-related actions taken during normal usage of the system, or a

synthetic view of the current security state of the system. A decision is

then taken to evaluate the probability that these actions or this state

3

can be considered as symptoms of an intrusion or vulnerabilities. A

counter measure component can then take corrective action to either

prevent the actions from being executed or change the state of the

system back to a secure state.

Intrusion Detection Systems (IDSs) are usually deployed along

with other preventive security mechanisms, such as access control and

authentication, as a second line of defense that protects information

systems. There are several reasons that make intrusion detection a

necessary part of the entire defense system. First, many traditional

systems and applications were developed without security in mind. In

other cases, systems and applications were developed to work in a

different environment and may become vulnerable when deployed

Intrusion detection complements these protective mechanisms to

improve the system security. Moreover, even if the preventive security

mechanisms can protect information systems successfully, it is still

desirable to know what intrusions have happened or are happening, so

that we can understand the security threats and risks and thus be

better prepared for future attacks.

The attack can be launched in term of fast attack or slow attack.

Fast attack can be defined as an attack that uses a large amount of

packet or connection within a few second [43]. Meanwhile, slow attack

4

can be defined as an attack that takes a few minutes or a few hours to

complete [43]. Both of the attack gives a great impact to the network

environment due to the security breach decade. Currently IDS is used

as one of the defensive tools in strengthening the network security

especially in detecting the first two phases of an attack either in form

slow or fast attack An intrusion detection system can be divided into

two approaches which are behavior based (anomaly) and knowledge

based (misuse) [26], [19]. The behavior based approach is also known

as anomaly based system while knowledge based approach is known

as misuse based system [151], [45].The misuse or signature based

IDS is a system which contains a number of attack description or

signature that are matched against a stream of audit data looking for

evidence of modeled attack [19]. The audit data can be gathered from

network traffic or an application log. This method can be used to detect

previous known attack and the profile of the attacker has to be

manually revised when new attack types are discovered. Hence,

unknown attacks in network intrusion pattern and characteristic might

not be captured using this technique [125].Meanwhile, the anomaly

based system identifies the intrusion by identifying traffic or application

which is presumed to be normal activity on the network or host.

5

The anomaly based system builds a model of the normal

behavior of the system and then looks for anomalous activity such as

activities that do not confirm to the established model. Anything that

does not correspond to the system profile is flagged as intrusive. False

alarms generated by both systems are major concern and it is

identified as a key issues and the cause of delay to further

implementation of reactive intrusion detection system [78].

Therefore, it is important to reduce the false alarm generated by

both of the systems. Although false alarm is a major concern in

developing the intrusion detection system especially the anomaly

based intrusion detection system, yet the system has fully met the

organizations’ objective compared to the signature based system [50].

The false positive generated by the anomaly based system is still

tolerable even though expected behavior is identified as anomalous

while false negative is in tolerable because they allow attack to go

undetected. An attack that uses a large amount of packet or

connection within a few second scanning attacks, DOS attack and

worm attack are some of fast attacks, Code Red Worm and NIMDA

worm are another breed of DOS attacks on Internet infrastructure after

the Morris Worm. Code Red Worm has a fast rate of propagation and

infection via network scanning to detect and automatically exploit.

6

1.2.1 Attack motivation and objectives

An intrusion attack [3] is realization of threat, the harmful action

aiming to target and exploit the system vulnerability. Computer attacks

may involve unauthorized access, destroying data; threaten the

security of computer or degrading its performance. Computer and

network attacks have evolved greatly over the last few decades. The

attacks are increasing in number and also improving in their strength

and erudition.

Attack motivation can be understood by identifying what the

attackers do. The main motivation of an attacker is to access to a

system or data; the main motivation of the criminal is to get financial

benefit. Other motivation factors are social, political gain. Mischievous

human tendency is also motivate of attack. The potential threat of

cyber terrorism becoming inevitable due to the critical infrastructures

that is potentially vulnerable [77] [84]. It is easy to attack due to growth

of network.

1.2.2 Types of Intrusion Attack

Intrusion attack [72] [99] can be categorized into four major types

DOS, Probe, U2R, R2L.figure 1.2 shows types of attacks.

7

1.2.2.1 DOS Attack

In a denial of service [77] attack, an attacker makes a resource

on a network either unavailable to justifiable user. DOS attacks make

system processes very busy and occupied with unwanted, unidentified

processes. It attacks on the resource like network bandwidth, computer

memory or computing power. There are many different types of DOS

attacks. For example attack can deny access to a machine on, a

network. The DOS attacks [146] [148] are meant to force the target to

stop the service(s) that is (are) provided by flooding it with probes

illegitimate requests.

Figure 1.2 Types of intrusion attack

8

1.2.2.2 Probe Attack

Probe attacks [84] are often the first step of all other attacks.

Probe attacks are used to collect information about the targeted

computer network or a definite machine on computer network. Network

probes are most important for attacker because through this only they

find vulnerabilities present on his target machine or network. That is

the reason why it is critical to detect this type of attacks. Mostly all

administrators use probe to check machines on a network, and so it is

difficult to detect which one is legitimate user and which one is

attacker. So it is also difficult to distinguish attacks from regular

actions. The probe attacks are meant to obtain information about the

target network from a source that is usually external to the targeted

network. Probing is an attack in which the hacker scans a machine or a

networking device in order to determine weaknesses or vulnerabilities

that may later be exploited so as to compromise the system.

1.2.2.3 U2R

The U2R [84] attacks are difficult to arrest because it involve the

semantic details that are very difficult to capture at an early stage.

Initially attacker starts off on the system with a normal user account

and then tries to get super user privileges rules by abusing

vulnerabilities. In a User to Root attack, an attacker starts a session on

9

a computer as a normal user with restricted rights and by exploiting

some vulnerability on the software installed on the system, the user

can raise his privilege. The purpose of this class of attack is obviously

to obtain administrator rights on the attacked computer in order to have

full control over it. There are several different types of U2R attacks.

Buffer overflow is undoubtedly the major vulnerability used by hackers

when trying to obtain privileged rights on a computer.

1.2.2.4 R2L

Most challenging attacks are R2L attacks [77] and they are very

difficult to detect because they involve the network level and the host

level features. A remote to user attack is an attack in which a user

sends packets to a machine through the internet, which attacker does

not have access to in order to expose the machines vulnerabilities and

exploit privileges which a local user would have on the computer. In a

Remote to Local attack, the attacker starts from a session on a

computer outside of the targeted network and exploits vulnerability in

order to gain access to a computer on the local network. A precondition

that must be fulfilled is the ability for the attacker to send network

packets to the victim host. Usually, but not always, Remote to Local

attacks are combined with U2R attacks permitting the attacker to get

10

full access of a remote machine which is part of an other network than

the network of the attacker.

1.2.3 Details of some Common Attacks

• Back - This attack is initiated against an apache Web server,

which is flooded with requests containing a large number of

fronts-slash (/) characters in the URL description. As the server

tries to process all these requests, it becomes unable to process

other genuine requests and hence, it denies service to its

customers.

• Smurf Attack - In a ‘smurf’ attack is a type of DOS attack. In this

attack many ICMP echo-reply packets are bombarded on

attacked machine. This attack throw many ICMP echo request

packets to the broadcast address of many subnets every

machine that belongs to any of these subnets responds by

sending ICMP ‘echo-reply’ packets to the victim. These packets

contain the victim's address as the source IP address. Smurf

attacks are very hazardous, because they are strongly

distributed attacks.

• Teardrop - Many times a packet is broken into smaller

fragments while travelling from the source machine to the

destination machine. A Teardrop attack creates a stream of IP

11

fragments with their offset field overloaded. The destination host

that tries to reassemble these malformed fragments eventually

crashes or reboots.

• Land - The Land a very common DOS (Denial of Service) attack

works by sending a spoofed packet with the SYN flag - used in a

‘handshake’ between a client and a host – set from a host to any

port that is open and listening. If the packet is programmed to

have the same destination and source IP address, when it is

sent to a machine, via IP spoofing, the transmission can fool the

machine into thinking that it is sending itself a message, which,

depending on the operating system, will crash the machine.

• Neptune (SYN Flood) - Neptune (SYN Flood) is an attack to

which every TCP/IP implementation is vulnerable. Each half-

open TCP connection made to a machine causes the 'tcpd'

server to add a record to the data structure that stores

information describing all pending connections. The data

structure which is used for this work is of finite size, and it can be

made to overflow by intentionally creating too many partially-

open connections. The half-open connections data structure on

the victim server system will eventually fill and the system will be

unable to accept any new incoming connections until the table is

emptied out.

12

• Ping of Death (POD) - Ping of Death attacks is the DOS attack

in which attacker creates a packet of size more than IP protocol

limit (more than 65,536 bytes). This packet can cause different

kinds of damage like rebooting and crashing of the machine that

receives it.

• Port sweep - A port sweep attack scans multiple hosts for one

port. For example port 80 is usually scanned for all the

addresses in a 24 bit address space. Port sweep is for one

listening port scanning multiple hosts. It searches for a specific

service, like SQL based computer worm may port sweep looking

for hosts listening on TCP port.

• NMAP - Nmap is the type of port scanner. Nmap has a large list

of parameters and performs following:

Host discovery – Identifying hosts on a network. For

example, listing the hosts that respond to pings or have a

particular port open.

Port scanning – Enumerating the open ports on target

hosts.

Version detection – Interrogating network services on

remote devices to determine application name and

version number.

13

OS detection – Determining the operating system and

hardware characteristics of network devices.

Scriptable interaction with the target – using Nmap

Scripting Engine (NSE) and Lu a programming language.

Nmap can provide further information on targets, including

reverse DNS names, device types, and MAC addresses.

• SATAN - SATAN (Security Administrator Tool for Analyzing

Networks) remotely probes systems through the network. Satan

stores its findings in a database. SATAN is a publicly available

tool that probes a network for security vulnerabilities and mis-

configurations. It is created to be used by administrators but

often used by attackers to search for vulnerabilities on a network.

Information provided by SATAN could be useful to an attacker in

performing an attack. Internet community uses a share ware

version of SATAN extensively. SATAN collects data from the

named hosts that it discovers while probing a primary host. A

primary target can be a host name, a host address, or a network

number. SATAN can generate reports of hosts by type, service,

and vulnerability and by trust relationship. It also gives details of

vulnerabilities and way to handle and remove them.

• phf Attack - A script named ‘phf’ can be. The legitimate use of

the phf script is to update the people directory, which is installed

14

by default in the cgi-bin directory. It is used to perform an attack

on the web server many times .The script’s behavior changes if

used with the ‘0a’ character in the URL when calling the script.

To perform an attack, the attacker appends ‘0a’ to the URL along

with some other UNIX command.

• Buffer overflows - There were four buffer overflow attacks

namely eject, fdformat, ffbconfig, and ps programmes. The

attacks on the first three programmes exploited a buffer over flow

condition to execute a shell with root privileges. The specification

used to monitor set uid to root programmes could easily detect

these attacks by detecting oversized arguments and the

execution of a shell. The ps attack was significantly more

complex than the other three buffer overflow attacks. For one

thing, it used a buffer overflow in the static area, rather than the

more common stack buffer overflow. Thus, it is difficult to detect.

Second, instead of shell program it used a chmod system call to

effect damage. Chmod operation is itself unusual, and it is not

permitted by generic specification (except on certain files).

• Ftp-write attack - The ftp-write attack is a R2L (remote to local)

user attack that takes advantage of a common anonymous ftp

misconfiguration. The ftp directory and its sub directories should

not be owned by the ftp account or be in the same group as the

15

ftp account. If any of these directories are owned by ftp or are in

the same group as the ftp account and are not write protected,

an intruder will be able to add files and eventually gain local

access to the system. This attack is easy to attack due to the

site-specific policy that no file could be written in ftp directory.

• Warez attacks - There are two types of warez attacks; warez

master and warez client. Warez master attack logs into an

unidentified FTP site and creates a file or a hidden directory. In

warez client attack, the file previously down loaded by the warez

master is uploaded. This attack could be easily captured by the

specifications which encoded the site specific policy of

disallowing any writes to the FTP directory.

1.3 Why we need IDS?

To answer this question, we need to understand why intruders

can get into the system.

There are various reasons of which the prominent ones are:

• Software bugs – they can be buffer overflows, unexpected

combinations, unhandled inputs, race conditions etc. Software

has bugs because programmers cannot track down and

eliminate all possible holes.

16

• Password Cracking – hackers have over the time developed

numerous ways to break into systems by knowing passwords

that were really weak, or by making dictionary & brute force

attacks.

• Design flaws – many systems that were developed early were

never designed to handle the wide scale intrusion that is there

today. These include TCP/IP protocol flaws, operating system

flaws etc.

• Sniffing unsecured traffic – traffic on the Internet is not

encrypted. Hackers can use programmers that can get sensitive

information from packets over the network. These include the

packet sniffers, port scanners etc.

A firewall cannot always handle attacks directed to exploit these

flaws. Hence, we require IDS which can logically complement the

firewall.

1.3.1 Efficiency of Intrusion Detection Systems

To evaluate the efficiency of an intrusion-detection system,

Porras and Valdes [116] have proposed the following parameters:

• Accuracy - Accuracy deals with the proper detection of attacks

and the absence of false alarms. Inaccuracy occurs when an

17

intrusion detection system flags a legitimate action in the

environment as anomalous or intrusive.

• Performance - The performance of an intrusion-detection

system is the rate at which audit events are processed. If the

performance of the intrusion-detection system is poor, then real-

time detection is not possible.

• Completeness - Completeness is the property of an intrusion-

detection system to detect all attacks. Incompleteness occurs

when the intrusion-detection system fails to detect an attack.

This measure is much more difficult to evaluate than the others

because it is impossible to have a global knowledge about

attacks or abuses of privileges.

• Fault Tolerance - An intrusion-detection system should itself be

resistant to attacks, especially denial-of-service type attacks, and

should be designed with this goal in mind. This is particularly

important because most intrusion-detection systems run above

commercially available operating systems or hardware, which

are known to be vulnerable to attacks.

• Timeliness - An intrusion-detection system has to perform and

propagate its analysis as quickly as possible to enable the

security officer to react before much damage has been done,

and also to prevent the attacker from subverting the audit source

18

or the intrusion-detection system itself. This implies more than

the measure of performance because it not only encompasses

the intrinsic processing speed of the intrusion-detection system,

but also the time required to propagate the information and react

to it.

1.4 Data mining

Data mining is a process that uses a variety of data analysis

tools to discover patterns and relationships in data that may be used to

make valid predictions [62]. To mine the hidden and useful information

we have to take the available dataset through the process of data

mining. It’s not a single step. It contains various groups of inter linking

steps which will help us to find the useful information for decision

making. Data mining searches databases to find hidden patterns and

predict information to increase the business in the organization.

Data mining is the non trivial extraction of implicit, previously

unknown, interesting and potentially useful information from data. Now

a day’s hospitals and health care institutions are well equipped with

monitoring and other data collection devices, where data is collected

and shared with other hospital information systems. Separated hospital

database or information system is now integrated as a large-scale

information system. The increase in data volume causes difficulties in

19

extracting useful information for decision support. In the time of

medical diagnosis, using data mining we can extract useful information

from large collection of patients data that can be used as a valuable

resource for the decision making process.

Classification, clustering, prediction, association, rule extraction

and sequence detection are the various types of problems we can

solve through data mining. The techniques used in data mining are

from different fields like statistics, machine learning and pattern

recognition. It includes statistical methods, case based reasoning,

neural network, decision trees, rule induction, Bayesian networks,

fuzzy sets, rough sets and genetic algorithms.

1.4.1 Data mining Life Cycle

We have to do the following steps to solve a data mining

problem [9] Selecting an appropriate data mining method. Training and

testing the selected data mining model. Final integration and evaluation

of the generated model. It is represented as a diagram in the Fig 1.3.

1.4.1.1 Define the problem

To have the successful data mining application, the organization

has to come up with a precise formulation of the problem they are

20

trying to solve. A focused problem statement usually results in the best

pay off.

Figure 1.3. Data mining life cycle [9]

1.4.1.2 Data collection and selection

The organization has to use the right data for mining. data

collection and selection step identifies the related data sources and

acquires it. From the collected data source data selection process

selects the subset of data to mine.

1. Define the Problem

2. Collect/Select Data

3. Data Pre-processing

4.Model Selection

5.Training /Testing the Model

6. Final Evaluation/Integration of the model

Iteration

21

1.4.1.3 Data Pre-processing

• Data cleaning - It fills in the missing data and correcting the

invalid data into a valid one. It finds the outliers data and

removes the inconsistencies in the data source.

• Data integration - It combines data from different data sources

into a single mining database.

• Data transformation - It converts the source data into a common

format for processing.

• Data reduction - It is a process of discarding unwanted

parameters from the data. So that the data volume will be less at

the same time it will not suffer on the quality of the information.

• Data discretization - It is a part of data reduction process. It

replaces the numerical attributes with the nominal attributes.

1.5 TYPES OF DATABASES

Data mining is not specific to any kind of data. Zaiane [44] claims

that data mining should be applicable to any kind of information

repository. But the challenges of mining posed by different kinds of

data vary significantly.

• Flat Files - Flat files containing text or binary data are the most

obvious candidates for data mining. Mining of text data is

22

referred to as text mining. It generally entails analyzing a large

volume of textual data to ascertain correlations or other patterns.

In the domain of software engineering, the mining of source code

is generally performed using text mining techniques. Software

requirement specifications and test case documents containing

textual information are attractive candidates for text mining.

• Relational Databases - Relational databases containing

information structured as tables where each row is termed as a

tuple and each column as an attribute provide excellent support

for several data mining algorithms. Data mining algorithms that

target relational databases are more versatile than those for flat

files [44]. Structured Query Language (SQL) is the standard

language for accessing relational databases, and data mining

algorithms can also leverage the capabilities of SQL for data

transformation and consolidation.

• Data Warehouses - Data warehouses are structured

repositories of data from multiple, heterogeneous sources. Data

warehouses facilitate analysis of data from different dimensions.

A data cube facilitates analysis of data along multiple dimensions

and each cell usually contains the value of some aggregate

measure. As Zaiane[44] states, because of their structure, the

pre-computed summarized data they contain and the hierarchical

23

attribute values of their dimensions, data cubes are well-suited

for fast interactive querying and analysis of data at different

conceptual levels, known as On-Line Analytical Processing

(OLAP). OLAP operations allow the navigation of data at

different levels of abstraction, such as drill-down, roll-up, slice,

dice etc.

• Transaction Databases - A transaction database contains

information pertaining to day-to-day transactions including a time

stamp, identifier and the associated items. Transaction

information is generally stored in flat files or in two normalized

relational tables - one containing the transactions and the other

containing the transaction items. A typical example for the

scenario is the market-basket analysis that attempts to track

transactions that occur together or in a sequence.

• Multimedia Databases - Mining of multimedia data such as

audio, video and graphics stored on a flat file or object-oriented

or object-relational databases is even more challenging due to

the high dimensionality of the involved data. This may entail

application of techniques from computer vision and computer

graphics.

• Spatial Databases - A spatial database stores a large amount of

space-related data, such as maps, pre-processed remote

24

sensing or medical imaging data. They carry topological and

distance information, usually organized by sophisticated,

multidimensional spatial indexing structures.

• Time-Series Databases - Time-series databases containing

time-related information like market share prices have a

continuous flow of data feeds that presents novel challenges,

and the mining of these databases entails evolution analysis and

trend prediction.

• World Wide Web-The World Wide Web (WWW) is a huge

repository of information, and mining this is commonly classified

into - Web Content Mining – which encompasses the documents,

Web Structure Mining - which focuses on the hyperlinks and

relationships between documents and Web Usage Mining which

focuses on the usage patterns of web pages. Web Mining can

greatly enhance the usability of the WWW.

1.6 Data Mining Applications

• Medical Data Mining - Over the past decade, nudged by new

federal regulations, hospitals and medical offices around the

country have been converting scribbled doctors’ notes to

electronic records. Although the chief goal has been to improve

efficiency and cut costs [106].

25

• Spatial Data Mining - Spatial data mining is the application of

data mining methods to spatial data. The end objective of spatial

data mining is to find patterns in data with respect to geography.

So far, data mining and Geographic Information Systems (GIS)

have existed as two separate technologies, each with its own

methods, traditions, and approaches to visualization and data

analysis. Particularly, most contemporary GIS have only very

basic spatial analysis functionality. The immense explosion in

geographically referenced data occasioned by developments in

IT, digital mapping, remote sensing, and the global diffusion of

GIS emphasizes the importance of developing data driven

inductive approaches to geographical analysis and modelling

[59,57].

• Sensor Data Mining - Wireless sensor networks can be used for

facilitating the collection of data for spatial data mining for a

variety of applications such as air pollution monitoring. A

characteristic of such networks is that nearby sensor nodes

monitoring an environmental feature typically registers similar

values. This kind of data redundancy due to the spatial

correlation between sensor observations inspires the techniques

for in-network data aggregation and mining. By measuring the

spatial correlation between data sampled by different sensors, a

26

wide class of specialized algorithms can be developed to

develop more efficient spatial data mining algorithms [83].

• Visual Data Mining - In the process of turning from analogical

into digital, large data sets have been generated, collected, and

stored discovering statistical patterns, trends and information

which is hidden in data, in order to build predictive patterns.

Studies suggest visual data mining is faster and much more

intuitive than is traditional data mining [114].

• Music Data Mining - Data mining techniques, and in particular

co-occurrence analysis, has been used to discover relevant

similarities among music corpora (radio lists, CD databases) for

purposes including classifying music into genres in a more

objective manner [54].

• Pattern Mining - "Pattern mining" is a data mining method that

involves finding existing patterns in data. In this context patterns

often means association rules. The original motivation for

searching association rules came from the desire to analyze

super market transaction data, that is, to examine customer

behaviour in terms of the purchased products. For example, an

association rules "beer Potato chips (58)" states that four out of

five customers that bought beer also bought potato chips. In the

context of pattern mining as a tool to identify terrorist activity, the

27

National Research Council provides the following definition:

"Pattern-based data mining looks for patterns (including

anomalous data patterns) that might be associated with terrorist

activity these patterns might be regarded as small signals in a

large ocean of noise"[107][65]. Pattern Mining includes new

areas such a Music Information Retrieval (MIR) where patterns

seen both in the temporal and non-temporal domains are

imported to classical knowledge discovery search methods.

• Subject-based Data Mining - "Subject based data mining" is a

data mining method involving the search for associations

between individuals in data. In the context of combating

terrorism, the National Research Council provides the following

definition: "Subject-based data mining uses an initiating

individual or other datum that is considered, based on other

information, to be of high interest, and the goal is to determine

what other persons or financial transactions or movements, etc.,

are related to that initiating datum" [4].

1.7 Data Mining in Medical Data

Modern medicine generates large amount of information stored

in the medical database. It is necessary to extract useful knowledge

and providing scientific decision-making for the diagnosis and

28

treatment of disease from the database increasingly becomes

necessary. Data mining in medicine can deal with this problem. It can

also improve the management quality of hospital information and

promote the development of telemedicine and community medicine.

Because the medical information is characteristic of redundancy, multi-

attribution, incompletion and closely related with time, medical data

mining differs from other one. In this paper we have discussed the key

techniques of medical data mining involving pre-treatment of medical

data, fusion of different pattern and resource, fast and robust mining

algorithms and reliability of mining results. The methods and

applications of medical data mining based on computation intelligence

such as artificial neural network, fuzzy system, evolutionary algorithms,

rough set, and association rules have been introduced [153][106].

1.7.1 Problems in Medical Data

Extensive amounts of knowledge and data stored in medical

database need us to develop specialized tools for accessing, data

analysis, knowledge discovery and effective use of stored knowledge

and data, Because of the increase of data volume results in difficulties

in extracting useful information for decision support.

The traditional manual data analysis has become insufficient.

Important issues that result from the rapidly emerging inclusive of data

29

and information are the provision of standard in terminology,

vocabularies and formats to support multi-liguity and sharing of data.

• Standards for the abstraction and visualization of data.

• Integration of heterogeneous types of data including image and

signals ...etc.

• Standards for interfaces between different resources of data.

• Reusability of data, knowledge and tools.

Many of the environments still lacks standards that impede the

use and analysis of data on a wide range of global data, limiting this

application to data sets collected for specific diagnostic, screening,

prognostic, monitoring, therapy support or other patient management

purposes [110].

1.8 Application to Medical Sensor Network (MSN)

In MSN, a mobile patient can communicate with hospital data center

and/or physician through wireless networks (e.g., cellular and sensor

networks). During the communication, a large amount of data should

be delivered through the intermediate sensor nodes. Medical sensor

data is very important because it is the data about patients’ health.

When any kind of attack exists, it should be detected quickly and

correctly. Otherwise, MSN will be collapsed.

30

Figure 1.4 Medical Sensor Network architecture

Figure 1.4 shows the general architecture of MSN. Among the

processes, Data Relay by Sensor nodes can be influenced by various

attacks. Because of the basic characteristics of MSN it is not easy to

supervise the nodes if they operate properly or not. Especially, in

sensor network, nodes can be added or removed randomly. In this

environment, proposed attack classification through unsupervised

learning data mining mechanisms is proper to apply. When some

attacks are detected, we can replace the nodes under attack or take

proper measures to the nodes, and make the data communication

reliable. It is believed that because MSN allows patients to do their

daily activities while they are monitored continuously anytime,

Data sending by Body Sensor

Aggregation by Personal Device

Data Relay by Sensor Nodes

Data Processing and Reaction by Health Care Center

31

anywhere, the proposed unsupervised learning mechanism for attack

detection mechanism is well applied.

1.9 Objectives of the Thesis

The objectives of the work are defined as below

1. To study and analyze different variants of intrusion detection

techniques meant for improving performance in Medical Sensor

network.

2. To design and develop an efficient approach for Intrusion Detection

using Clustering and Hybrid techniques.

3. To analyze the proposed approach on KDD cup-99 dataset and to

evaluate the result to attain high accuracy.

1.10 Scope of the thesis

The main intention of this research is to develop a network intrusion

detection system by utilizing data mining and artificial intelligence

techniques. Recently intrusion detection systems are designed to

classify attacks by incorporating enhanced rules as learnt from the

network behaviour [19] based on fuzzy class association rule mining

method and genetic network programming (GNP) [46]. In this research

a hybrid method is proposed for intrusion detection using Linear

32

Discriminant Analysis + Cuckoo search+ Fuzzy Bisector-Kernel Fuzzy

C-means clustering and Bayesian neural network.

1.11 Organization of the Thesis

This thesis comprises seven chapters. Chapter 1 introduces the

concept of IDS and motivation. Principle of Data Mining, classification

of data and field of applications are also discussed. The objectives and

scope of the thesis are also presented.

In Chapter 2, Literature reviews based on previous works are

discussed. Classification of intrusion detection systems, Types of

Protected Systems, IDS Data Processing Techniques, Data mining and

Knowledge discovery, Evaluation of Datasets and Feature Selection

are also discussed. Advantages and limitations for the previous works

will also be discussed.

Chapter 3 explains about the database used in this thesis,

Proposed Feature Extractions and Pre-processing techniques and

performance evaluation metrics of Intrusion Detection System are

presented.

In Chapter 4 some existing clustering techniques such as K

Means Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and

KFCM are discussed and implemented. A proposed Fuzzy Bisector-

33

Kernel Fuzzy C-means clustering technique (FB-KFCM) and their

performances are discussed.

Proposed work

Introduction on MSN • Problem

identification

• Object of the scope

Related work on MSN

• Methodology &database

• Dataset description

Clustering based intrusion detection

Hybrid intrusion detection system

Results and Implementation

34

The Hybrid Intrusion Detection System using LDA+CS (Linear

Discriminant Analysis + Cuckoo search) is developed by combining

LDA and CS. Fuzzy Bisector- Kernel Fuzzy C-means clustering (FB-

KFCM) is used as the clustering techniques and in this proposed

system the Bayesian Neural Network is used for better classification

are discussed in Chapter 5.

Comparison of the existing technique such as KFCM + Bayesian

network and Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-

KFCM) + Bayesian network are compared along with the proposed

hybrid technique LDA+CS + FB-KFCM + Bayesian Network their

results are discussed. Hybrid combinations are proposed to achieve

higher accuracy and reliability are discussed in Chapter 6.

Chapter 7 concludes the design methodology for IDS to achieve

higher accuracy and reliability. The result obtained and suggestions for

future development to achieve higher accuracy in IDS are also

discussed.

35

1.12 Summary

Intrusion detection has received a lot of interest among the

researchers due to the rapid development and popularization of the

Internet and local networks. This chapter introduces the concept of IDS

and motivation. Efficiency of Intrusion Detection Systems, Principle of

Data Mining, Types of Databases and field of applications are also

discussed. The objectives and scope of the thesis are also presented.

36

CHAPTER 2

LITERATURE REVIEW

2.1 Intrusion Detection System (IDS)

An Intrusion Detection System (IDS) is software and/or

hardware, which is designed for identifying the undesirable efforts for

enhancing the computer security systems [35]. Especially, the wireless

sensor devices has given rise to a wider range of amazing applications

in various walks of our life that involve environment and habitat

monitoring, healthcare applications and many more. But,

simultaneously, the sensor nodes have produced the same number of

threats caused by attackers, whose intention is to achieve access to

the network and the data transferred inside it. Till now, numerous

classical security methodologies exist for the purpose of avoiding these

intrusions [64].

IDS a concept originally introduced by Anderson [68] and later

formalized by Denning [37], have received increasing attention over the

past 20 years. IDSs are systems that aim at detecting intrusions, i.e.,

sets of actions that attempt to compromise the integrity, confidentiality

or availability of a computer resource [119].In short, computer security

deals with the protection of data and the computing resources and is

37

commonly associated with the following three properties (commonly

referred to as C.I.A. triad) [86]

2.1.1 Confidentiality

It is prevention of any intentional or unintentional unauthorized

disclosure of data. For example, an intruder learning about the

customer credit card database or getting access to the proprietary

source code is considered a breach of confidentiality. Note that

typically such a breach is irreversible and cannot be confined easily.

The term confidentiality can also be understood in a broader context in

which it also pertains to the non-delivery of services to unauthorized

users, even though this would not compromise confidentiality in itself.

2.1.2 Integrity

It is prevention of intentional or unintentional unauthorized

modification of data. For example, an intruder defacing the company’s

web server or modifying the bank’s database content for personal gain

is an attack against data integrity. Note that typically integrity can be

restored, e.g., from other sources such as backup copies, although this

process may be costly, time-consuming, and not always complete.

38

2.1.3 Availability

It is prevention of the unauthorized withholding of computing

resources. Examples of availability include the denial-of-service (DOS)

attack, in which the attacker blocks the computing resources so that

authorized users cannot use them, or physical equipment theft.

Based on this definition of the C.I.A triad, it can be defined

intrusion as follows:

Intrusion is any set of actions that attempt to compromise the

confidentiality, integrity or availability of a computer resource.

An intrusion detection system monitors computer systems and

networks to determine if a malicious event (i.e., an intrusion) has

occurred. Each time a malicious event is detected, the IDS raise an

alarm.

Typically, the requirements for confidentiality, integrity and

availability are not absolute, but are defined by a security policy.

The security policy states which information is confidential, who

is authorized to modify given information and what kind of use of

computer systems is acceptable. Therefore we can reformulate the

initial definition of intrusion as Intrusion is a violation of a security

policy.

39

One may categorize intrusion detection systems in terms of

behaviour i.e., they may be passive (those that simply generate alerts

and log network packets). They may also be active which means that

they detect and respond to attacks, attempt to patch software holes

before getting hacked or act proactively by logging out potential

intruders, or blocking services.

2.2 Classification of intrusion detection systems

Primarily, an IDS is concerned with the detection of hostile

actions. This network security tool uses either of two main techniques.

One category is for analyzing the network traffic and the other is to

analyze the operating system audit trails. These systems use either the

rule-based misuse detection or anomaly detection naturally [115] and

their power relies on the ability of the security personnel developing

them to a larger extent. The first category is capable of identifying the

known attack types alone. On the contrary, the second category is

subjected to the generation of false positive alarms. Therefore, several

machine learning techniques have been applied for designing IDS.

These machine learning techniques include neural networks, linear

genetic programming, Support vector machines, Bayesian Networks,

Multivariate adaptive regression splines and Fuzzy inference systems.

[19].Likewise, several data mining techniques has been developed as

40

well to detect the key features or parameters that help in defining

intrusions [140].

Figure 2.1 Intrusion Detection System Classifications and Processing

2.2.1 Intrusion Detection Approach

This network security tool uses either of two main techniques

(described in more detail below). The first one, anomaly detection,

explores issues in intrusion detection associated with deviations from

normal system or user behaviour. The second employs signature

41

detection to discriminate between anomaly or attack patterns

(signatures) and known intrusion detection signatures. Both methods

have their distinct advantages and are advantages as well as suitable

application areas of intrusion detection.

2.2.1.1 Anomaly-Based Detection

Anomaly-based detection is the process of comparing definitions

of what activity is considered normal against observed events to

identify significant deviations. An IDS using anomaly-based detection

has profiles that represent the normal behaviour of such things as

users, hosts, network connections, or applications. The profiles are

developed by monitoring the characteristics of typical activity over a

period of time. For example, a profile for a network might show that

Web activity comprises an average of 13% of network bandwidth at the

Internet border during typical working day hours.

The IDS uses statistical methods to compare the characteristics

of current activity to thresholds related to the profile, such as detecting

when Web activity comprises significantly more bandwidth than

expected and alerting an administrator of the anomaly. Profiles can be

developed for many behavioural attributes, such as the number of e-

mails sent by a user, the number of failed login attempts for a host, and

the level of processor usage for a host in a given period of time.

42

2.2.1.2 Signature-Based Detection

A signature is a pattern that corresponds to a known threat.

Signature based detection is the process of comparing signatures

against observed events to identify possible incidents. Signature-based

detection is very effective at detecting known threats but largely

ineffective at detecting previously unknown threats, threats disguised

by the use of evasion techniques, and many variants of known threats.

Signature-based detection is the simplest detection method because it

just compares the current unit of activity, such as a packet or a log

entry, to a list of signatures using string comparison operations.

Signature-based detection technologies have little understanding of

many network or application protocols and cannot track and

understand the state of complex communications.

2.2.2 Types of Protected Systems

There are many types of IDS technologies. They are divided into

the following three groups based on the type of events that they

monitor and the ways in which they are deployed:

2.2.2.1 Host Based Intrusion Detection

Host-Based System monitors the characteristics of a single host

and the events occurring within that host for suspicious activity.

43

Examples of the types of characteristics host-based IDS might monitor

are network traffic (only for that host), system logs, running processes,

application activity, file access and modification, and system and

application configuration changes. Host-based IDSs are most

commonly deployed on critical hosts such as publicly accessible

servers and servers containing sensitive information.

Host-based IDS places monitoring sensors also known as

agents on network resources nodes to monitor audit logs which are

generated by Network Operating System or application program. Audit

logs contain records for events and activities taking place at individual

Network resources. Because this Host-Based IDS can detect attacks

that cannot be seen by Network-based IDS such as Intrusion and can

be misuse by trusted insider. Host-based system utilize Signature rule

base which is derived from site-specific security policy. Host-Based

can overcome the problems associated with Network based IDS

immediately after alarming the security personnel who can locate the

source provided by site security policy. Host-based IDS can also verify

if any attack was unsuccessful, either because of immediate response

to alarm or any other reason but this is not available at packet level.

Host-Based IDS can also maintain user login and user logoff action

and all activity that generates audit records.

44

A Host based Intrusion Detection system has only host based

sensors and a network based Intrusion detection system has network-

based sensor [2]. Host-based technology examines events like what

files were accessed and what applications were executed [56].

Network-based intrusion detection is the problem of detecting

unauthorized use of computer systems over a network, such as the

Internet [33].

A good intrusion detection system should be able to distinguish

between normal and abnormal user activities [8]. This would include

any event, state, content, or behaviour that is considered to be

abnormal by a pre-defined standard [47]. Data mining-based intrusion

detection systems can be classified according to their detection

strategy. There are two main strategies such as misuse detection and

anomaly detection [111]. Misuse detection, which uses patterns of well-

known attacks or weak spots of the system to identify intrusions [145]

and anomaly detection, which tries to determine whether deviation

from the established normal usage patterns can be flagged as

intrusions [70,98]. One major challenge in intrusion detection is that we

have to identify the camouflaged intrusions from a huge amount of

normal communication activities [70].

45

In order to detect intrusion activities, many Machine Learning

(ML) algorithms, such as Neural Network [21], Support Vector Machine

[32], Genetic Algorithm [154], Fuzzy Logic [96], and Data Mining [88],

etc. have been widely used to the huge volume of complex and

dynamic dataset to detect known and unknown intrusions. It is very

important for IDSs to generate rules to distinguish normal behaviours

from abnormal behaviour by observing dataset, which is the record of

activities generated by the operating system that are logged to a file in

chronologically sorted order [33].

Hence, IDS should lower the quantity of data to be processed

and this is more vital in case of real-time detection. Data filtering, data

clustering and feature selection can achieve reduction of data.

Clustering can be done to obtain the hidden patterns in the data and

the essential features used for detection purposes. Better classification

is possible with feature selection, which searches for the subset of

features that excellently classifies the training data [134]. The classical

cluster analysis works by assigning each datum to exactly one cluster.

But, the fuzzy cluster analysis improves this requirement by using

gradual memberships. This helps in dealing with the data that

simultaneously belong to more than one cluster. The intrusion

detection systems (IDS) extensively use the Clustering methodologies

46

and in particular, the fuzzy approaches seem to be more efficient than

the other clustering algorithms in use. Fuzzy C-Means clustering model

(FCM)was initially introduced by Dunn in 1974 and it was extended and

generalized by Bezdek in 1983 [123].Generally, the techniques for

dimensionality reduction concentrate either on choosing a suitable

subset from the original set of I attributes or on mapping the initial I-

dimensional data onto the K-dimensional space, where K<I[136].

Most of the recent feature extraction techniques involve linear

transformations of the original pattern vectors to new vectors of lower

dimensionality [120].The renowned technique for reducing the

dimensionality is the Principal Component Analysis. But, problems

arise in this method with the selection of number of directions. It does

not perform the computation of principal component in high

dimensional feature spaces that have relation to input space by some

nonlinear map [127]. Linear Discriminate analysis feature reduction

technique is the new scheme employed in the field of cyber-attack

detection. This method reduces the number of input features in

addition to improving the classification accuracy. Moreover, the training

and the testing time of the classifiers can be decreased by this method

through the selection of most discriminating features [134].

47

The way the optimal set of features is selected forms the major

problem encountered by most of the researchers. This is because; all

the features are not related to the learning algorithm. In some

situations, irrelevant and redundant features can produce noisy data

that can distract the learning algorithm and degrade the detector

accuracy, leading to time consuming training and testing processes.

Feature selection was proved to have a considerable effect on the

performance of the classifiers [105]. A feed-forward neural network

classically trained using back-propagation can be regarded as an

effective classifier of the actions produced by the head of severely

disabled people [138], [74], and [109]. Yet, there are demerits with the

standard neural networks because it offers poor generalisation ability

when provided with limited training data. Bayesian techniques have

been applied to neural networks in the recent years for enhancing the

accuracy and robustness of neural network classifiers [133]. In our

previous research [79], it has been shown that the Bayesian neural

network is capable of classifying the head movement commands

consistently even with limited training data.

2.2.2.2Network Based Intrusion Detection

It monitors network traffic for particular network segments or

devices and analyzes the network and application protocol activity to

48

identify suspicious activity. It can identify many different types of events

of interest. It is most commonly deployed at a boundary between

networks, such as in proximity to border firewalls or routers, virtual

private network (VPN) servers, remote access servers, and wireless

networks.

Network based IDS are best suited for alert generation of

intrusion from outside the perimeter of the enterprise. The network

based IDS are inserted at various points on LAN and observe packets

traffic on the Network information is assembled into packets and

transmitted on LAN or Internet. Network based IDS are valuable if they

are placed just outside the firewalls, thereby alerting personals to

incoming packets that might circumvent to the firewall. Some Network-

Based IDS take or allows taking input of Custom signatures taken from

user security policy which permits limited detection security policy

violation. This limitation is due to packets traffic information that does

not work well today in switched and encrypted environments where

packets analysis is weak in detecting, attacking or originating from

authorized Network users. Network-Based Intrusion Detection Systems

(IDS) use raw network packets as the data source. The IDS typically

uses a network adapter in promiscuous mode that listens and analyses

all traffic in real-time as it travels across the network.

49

To detect newly encountered attacks, various researches have

been undertaken which use data mining as the key component [53].

Data mining is the analysis of data to establish relationships and

identify hidden patterns of data which otherwise would go unnoticed.

Many researchers have dwelled into the field of database intrusion

detection in databases using data mining [129].

Several data mining techniques have been applied for intrusion

detection, where, K-Mean Clustering [12] is unsupervised data mining

techniques for intrusion detection. K-Means is a popular partition

clustering algorithm for its simplicity in implementation, and it is

commonly applied in diverse applications. The main drawbacks of the

k-means algorithm are: the choice of the value of k, the cluster result is

sensitive to the selection of the initial cluster centroids and

convergence to the local minimum. In order to overcome the difficulties

of K-Means clustering, several authors put modifications on the K-

Means clustering. In [96], modification to K-Means clustering algorithm

has been proposed for intrusion detection. This modified K-Means

clustering algorithm is called as Y-Mean clustering that is extensively

used for detecting the intrusion behaviour.

On the other hand, many researchers have argued that Artificial

Neural Networks (ANNs) can improve the performance of intrusion

50

detection systems (IDS) when compared with traditional methods.

Artificial Neural Network (ANN) is one of the widely used techniques

and has been successful in solving many complex practical problems.

However, for ANN-based IDS, detection precision, especially for low-

frequent attacks, and detection stability are still needed to be

enhanced. Furthermore, some of the researchers utilized Self-

Organizing Map (SOM) or Self-Organizing Feature Map (SOFM) that is

a type of artificial neural network, trained using unsupervised learning

to produce a low-dimensional (typically two-dimensional), discretized

representation of the input space of the training samples, called a map.

Self-organizing maps are different from other artificial neural networks

in the sense that they use a neighbourhood function to preserve the

topological properties of the input space. By providing the better

detection accuracy, some of the researchers combined ANN with the

data mining approaches to solve the problem and help IDS achieve

higher detection rate, less false positive rate and stronger stability.

In recent times, intrusion detection has received a lot of interest

among the researchers because it is widely applied for preserving the

security within a network. Here, we present some of the techniques for

intrusion detection. G. Gowrisona et al. [53] designed an intrusion

detection system to classify the network behaviour with less

51

computational complexity of O (n). The KDD Cup99 is a bench mark

data used here to achieve promising classification rate. To achieve

high detection rate in Intrusion Detection System (IDS), Shingo Mabu

et al [129], described a fuzzy class association rule mining method

based on Genetic Network Programming (GNP). GNP is used to

enhance the representation ability with compact programs derived from

the reusability of nodes in a graph structure. The combined method is

evaluated with KDD99Cup and DARPA98 databases and showed that

it provides competitively high detection rates.

However, to overcome the network based anomalies detection

issue, LatifurKhanet al. [87] has proposed a method which was the

combination of SVM and DGSOT, which starts with an initial training

set and expanded it gradually using the clustering structure produced

by the DGSOT algorithm. They compared the proposed approach with

the Rocchio Bundling technique and random selection in terms of

accuracy loss and training time gain using a single benchmark real

data set. Due to the necessity of misuse and anomaly detection in a

single system, M. Bahrololumet al. [12] proposed an approach to

design the system using a hybrid of misuse and anomaly detection for

training of normal and attack packets respectively. The utilized method

for attack training was the combination of unsupervised and supervised

52

Neural Network (NN) for Intrusion Detection System. By misuse

approach known packets were identified fast and unknown attacks

were also be detected.

For the importance of an efficient Intrusion Detection System,

K.S. Anil Kumar and V. Nanda Mohan [10] proposed a combination of

three techniques comprising two machine-learning paradigms. K-

Means Clustering, Fuzzy Logics and Neural Network techniques were

deployed to configure an effective intrusion detection system. This

approach revealed the advantage of converging K-Means-Fuzzy-

Neural network techniques to eliminate the preventable interference of

human analyst in such occasions. Also, to improve the accuracy as

well as efficiency of the Intrusion Detection System, Shekhar R.

Gaddamet al.[128] presented "K-Means+ID3,” a method to cascade k-

Means clustering and the ID3 decision tree learning methods for

classifying anomalous and normal activities in a computer network, an

active electronic circuit, and a mechanical mass-beam system. Results

showed that the detection accuracy of the K-Means+ID3 method was

as high as 96.24 percentages at a false-positive-rate of 0.03

percentages on NAD; the total accuracy was as high as 80.01

percentages on MSD and 79.9 percentages on DED.

53

To overcome network security issues and to find better method

than SVM, M. Ektefaet al. [38] have presented intrusion detection using

data mining techniques such as classification tree and support vector

machines. Their result indicated C4.5 algorithm is better than SVM in

detecting network intrusions and false alarm rate in KDD CUP 99

dataset. Rasha G. Mohammed Helali [118] has presented a survey on

data mining based network Intrusion Detection System (IDS). They

presented the features of signature based NIDS in addition to the

current state-of-the-art of Data Mining based NIDS approaches.

Intruder was one of the most publicized threats to security. Network

Intrusion Detection Systems (NIDS) had become a standard

component in network security infrastructures. They provided general

guidance for open research areas and future directions. The intention

of their survey was to give the reader a broad overview of the work that

had been done at the intersection between intrusion detection and data

mining.

Anomaly Intrusion Detection System (IDS) have various

drawbacks like complex computation and inefficiency in real time

detection. So, in order to reduce the computational complexity, Zhiyuan

Tan et al [155] have designed a method called Linear Discriminant

Analysis (LDA). They have used the difference distance map for

54

selecting the significant features. Here, the high-dimensional feature

vectors were transformed into a low-dimensional domain by the

designed method initially. Then, based on the Euclidean distance on

the simple, low dimensional feature domain, they have identified the

similarity between the new incoming packets and a normal profile. The

experimental results were based on the pre-calculated threshold, which

differentiates normal and abnormal network packets. DARPA 1999 IDS

dataset was used here to evaluate their proposed method. But the

conventional Linear Discriminant Analysis (LDA) feature reduction

technique has drawbacks that were not suitable for non-linear dataset.

In general, the huge sized network traffic data used in intrusion

detection system have ineffective information that affects the system

accuracy. So in order to overcome this drawback, Shailendra Singh

and Sanjay Silakari [126] have designed an efficient feature reduction

method called Generalized Discriminant Analysis (GDA). The number

of input features was reduced by this method. Also, the classification

accuracy was increased and the time required for classifier in training

and testing was reduced by selecting the most discriminating features.

The performance of their designed method was evaluated by Artificial

Neural Network (ANN) and C4.5 classifiers. The experimental results

have shown that the accuracy of their designed method was improved.

55

The previously used k-means clustering algorithm in intrusion

detection system have various drawbacks such as computation

complexity and the selection of initial central point affects the

algorithmic results. So, Li Tian and WangJianwen [93] have designed

an improved k-means clustering algorithm, which introduced the

optimized dynamic central point cyclic method. The improved

clustering method applied in the intrusion detection system has

enhanced the fault detection rate of abnormal detection and has

reduced the false drop rate effectively as well. Finally, the algorithm

was evaluated by KDD cup 99 dataset to show that the accuracy of

data classification and the detection efficiency has increased

significantly. Also, the experimental results have revealed that the

designed algorithm has achieved the desired objectives with a higher

detection rate and higher efficiency.

The different issues found in the intrusion detection system were

regular updating, lower detection, capability to unknown attacks, non-

adapting high false alarms rate, high resources consumption and many

others. However, because of the importance of soft computing in the

intrusion detection system, Hafiz Muhammad Imran et al [55] have

introduced an efficient soft computing method to select the optimum

subset of features. Here, to get better results, they have provided a

56

hybrid method called LDA + GA for feature transformation and

selection. LDA was chosen here as a feature reduction method

because it outperformed PCA. Also, the dataset used here for training

and testing was standard NSL-KDD dataset. Further, to classify the

network traffic into normal or intrusive activities, they have used an

outstanding classification method called RBF. The experimental results

of our designed method have shown that the selection of optimal

subset of features has reduced the time consumption rate and

increased the accuracy ratio as well.

The existing intrusion detection system makes use of the entire

irrelevant features. Hence, to produce an effective and efficient

classification process, a well-defined feature extraction algorithm was

essential. Rupali Datti and Bhupendraverma [120] have suggested an

efficient feature extraction method called as Linear Discriminant

Analysis (LDA) for intrusion detection system. The back propagation

algorithm was employed to perform the classification process. This

method aims to identify the significant input features that are

computationally efficient and effective in constructing IDS. It is

apparent from their experimental results that the proposed model has

offered improved and robust representation of data. This is because, it

has achieved 97% of data reduction and about 94% of training time

57

reduction. In addition, the accuracy achieved in identifying the new

attacks is found to be more or less the same. The number of computer

resources as well as both the memory and the CPU time spent on

detecting an attack was also decreased. The experimental results have

shown that their method was reliable for detecting intrusion.

To deal with the multiclass problem in intrusion detection

system, Snehal A. Mulayet al [132] have designed a decision-tree-

based support vector machine that uses support vector machines and

decision tree in a combined fashion. The non-time consuming training

and testing processes may be viewed as the benefits of this method,

which in turn increases the system efficiency. At first, the dataset was

split into two subsets from root to the leaf until every subset contains

only one class. This had a larger impact on the classification

performance of their system. Though the final results for the designed

method was not presented, it can be known that the multiclass pattern

recognition problems can be solved using the tree structured binary

SVMs and the resultant intrusion detection system could be of more

speed than the other methods.

Shingo Mabuet al [129] has developed a GNP-based fuzzy

class-association-rule mining with sub attribute utilization and

classifiers that rely on the extracted rules. It is capable of consistently

58

utilizing and combining the discrete and continuous attributes in a rule

and can efficiently extract several superior rules for classification. As

an application, intrusion-detection classifiers for both misuse detection

and anomaly detection have been developed and their effectiveness

was proved using KDD99Cup and DARPA98 data. The experimental

results of misuse detection depict that the designed method offers high

DR and low PFR that serve as the two important criteria for security

systems.

Gang Wang et al, [49] have proposed an intrusion detection

method called as FC-ANN that depends on ANN and fuzzy clustering.

The fuzzy clustering technique was employed to partition the

heterogeneous training set in to numerous homogenous subsets. In

such a way, the complexity of each of the sub training set was

decreased and as a result, the detection performance was increased.

The experimental results using the KDD CUP

1999datasetdemonstrates the effectiveness of their method, in

particular, for low-frequent attacks like R2L and U2R attacks in terms

of detection precision and detection stability.

Detecting network intrusion has been not only important but also

difficult in the network security research area [20]. In Medical Sensor

Network (MSN), network intrusion is critical because the data delivered

59

through network is directly related to patients’ lives. Traditional

supervised learning techniques are not appropriate to detect

anomalous behaviors and new attacks because of temporal changes in

network intrusion patterns and characteristics in MSN. Therefore,

unsupervised learning techniques such as SOM (Self-Organizing Map)

are more appropriate for anomaly detection. This work proposed a

real-time intrusion detection system based on SOM that groups similar

data and visualize their clusters. The system labels the map produced

by SOM using correlations between features. The system with KDD

Cup 1999 data set because MSN data is not available yet. The system

yields the reasonable misclassification rates and takes 0.5 seconds to

decide whether a behavior is normal or attack.

The KDD Cup 99 dataset has been the point of attraction for

many researchers in the field of intrusion detection from the last

decade. Many researchers have contributed their efforts to analyze the

dataset by different techniques. Analysis can be used in any type of

industry that produces and consumes data, of course that includes

security. This paper is an analysis of 10% of KDD cup’99 training

dataset based on intrusion detection. This work focused on

establishing a relationship between the attack types and the protocol

used by the hackers, using clustered data. Analysis of data is

60

performed using k-means clustering; In this work used the Oracle 10g

data miner as a tool for the analysis of dataset and build 1000 clusters

to segment the 494,020 records. The investigation revealed many

interesting results about the protocols and attack types preferred by the

hackers for intruding the networks.

In this work, establish a different implementation level clustering

technique which provides a new dimension for classification of

datasets. The training set and the testing set of data are classified

according to the separate kind of algorithms discussed here. The

analysis of the performance will show a clear edge over the other

existing technique used for data classification. The future research

issues which need to be resolved and investigated further are given

with new trends and ideas.

2.2.2.3 Hybrid Based Intrusion Detection

It has been examined the different IDSs use different mechanisms to

signal or trigger alarms on your network. It is also examined two

locations that IDSs use to search for intrusive activity. Each of these

approaches has benefits and drawbacks. By combining multiple

techniques into a single hybrid system, however, it is possible to create

61

IDS that possesses the benefits of multiple approaches, while over

coming many of the drawbacks.

2.3 Structure of IDS

With respect to where and how data is processed by the

intrusion detection system, the intrusion detection systems can be

classified into distributed and centralized. A distributed intrusion

detection system (DIDS) is one where data is collected and analyzed in

multiple hosts, as opposed to a centralized intrusion detection system

(CIDS), in which data may be collected in a distributed fashion, but is

processed centrally. Both distributed and centralized intrusion

detection systems may use host- or network-based data collection

methods, or a combination of them.

2.3.1 Data Source

Intrusion detection systems can run on either a continuous or

periodic feed of information (Real-time IDS and Interval-based IDS

respectively) [7] and hence they use two different intrusion detection

approaches.

Audit trail analysis is the prevalent method used by periodically

operated systems. In contrast, the IDS deployable in real-time

62

environments are designed for online monitoring and analyzing system

events and user actions.

2.3.2 Behaviour of an attacker

Intrusion detection systems must be capable of distinguishing

between normal (not security-critical) and abnormal user activities, to

discover malicious attempts in time. However translating user

behaviours (or a complete user-system session) in a consistent

security-related decision is often not that simple - many behaviour

patterns are unpredictable and unclear (Fig. 2.2).

In order to classify actions, intrusion detection systems take

advantage of the anomaly detection approach, sometimes referred to

as behaviour based [Deb99] or attack signatures i.e. a descriptive

material on known abnormal behaviour (signature detection), also

called knowledge based.

63

Figure 2.2 Behaviour of the user in the system

One may categorize intrusion detection systems in terms of

behaviour i.e., they may be passive (those that simply generate alerts

and log network packets). They may also be active which means that

they detect and respond to attacks, attempt to patch software holes

before getting hacked or act proactively by logging out potential

intruders, or blocking services.

2.3.3 Analysis Timing

Intrusion detection systems can run on either a continuous or

periodic feed of information (Real-time IDS and Interval-based IDS

respectively) and hence they use two different intrusion detection

approaches. Audit trail analysis is the prevalent method used by

periodically operated systems. In contrast, the IDS deployable in real-

time environments are designed for online monitoring and analyzing

system events and user actions.

2.3.3.1 Audit Trail Processing

There are many issues related to audit trail (event log) [11]

processing. Storing audit trail reports in a single file must be avoided

64

since intruders may use this feature to make unwanted changes. It is

far better to keep a certain number of event log copies spread over the

network, though it would imply adding some overheads to both the

system and network.

Further, from the functionality point of view, recording every

event possible means a noticeable consumption of system resources

(both the local system and network involved). Log compression,

instead, would increase the system load. Specifying which events are

to be audited is difficult because certain types of attacks may pass

undetected.

It is also difficult to predict how large audit files can be – through

experience one can only make a rough estimate. Also, an appropriate

setting of a storage period for current audit files is not a straight

forward task. In general, this depends on a specific IDS solution and its

correlation engine. Certainly, archive files should be stored as copies

for retrieval analysis purposes.

It is also difficult to predict how large audit files can be – through

experience one can only make a rough estimate. Also, an appropriate

setting of a storage period for current audit files is not a straight

forward task.

65

In general, this depends on a specific IDS solution and its

correlation engine. Certainly, archive files should be stored as copies

for retrieval analysis purposes.

2.3.3.2 On-Fly Processing

With on the fly processing [14], IDS performs online verification

of system events. Generally, a stream of network packets is constantly

monitored constantly. With this type of processing, intrusion detection

uses the knowledge of current activities over the network to sense

possible attack attempts (it does not look for successful attacks in the

past).

Given the computation complexity, the algorithms that are used

here are limited to quick and efficient procedures that are often

algorithmically simple. This is due to a compromise between the main

requisite – attack detection capability and the complexity of data

processing mechanisms used in the detection itself.

At the same time, construction of an on-the-fly processing IDS

tool[32] requires a large amount of RAM (buffers) since no data

storage is used. Therefore, IDS may sometime miss packets, because

realistic processing of too many packets is not available.

66

The amount of data collected by the detector is small since it

views only buffer contents. Hence, only small portions of information

can be analyzed for searching certain values or sequences.

2.4 IDS Data Processing Techniques

Depending on the type of approach taken in intrusion detection,

various processing mechanisms (techniques) [36, 44] are employed for

data that is to reach IDS. Below, several systems are described briefly:

2.4.1 Expert systems

These work on a previously defined set of rules describing an

attack. All security related events incorporated in an audit trail are

translated in terms of if-then-else rules. Examples are Wisdom &

Sense and Computer Watch (developed at AT&T).

2.4.2 Signature analysis

Similarly to expert System approach, this method is based on

the attack knowledge. They transform the semantic description of an

attack into the appropriate audit trail format. Thus, attack signatures

can be found in logs or input data streams in a straightforward way. An

attack scenario can be described, for example, as a sequence of audit

events that a given attack generates or patterns of searchable data

67

that are captured in the audit trail. This method uses abstract

equivalents of audit trail data. Detection is accomplished by using

common text string matching mechanisms. Typically, it is a very

powerful technique and as such very often employed in commercial

systems (for example Stalker, Real Secure, Net Ranger, Emerald

eXpert-BSM).

2.4.3 Colored Petri Nets

The Colored Petri Nets [48] approach is often used to generalize

attacks from expert knowledge bases and to represent attacks

graphically. Purdue University’s IDIOT system uses Colored Petri Nets.

With this technique, it is easy for system administrators to add new

signatures to the system. However, matching a complex signature to

the audit trail data may be time-consuming. The technique is not used

in commercial systems.

2.4.4 State-Transition Analysis

An attack is described with a set of goals and transitions that

must be achieved by an intruder to compromise a system. Transitions

are represented on state-transition diagrams.

68

2.4.5 Statistical Analysis Approach

This is a frequently used method (for example SECURENET)

[99]. The user or system behaviour (set of attributes) is measured by a

number of variables over time. Examples of such variables are: user

login, logout, number of files accessed in a period of time, usage of

disk space, memory, CPU etc. The frequency of updating can vary

from a few minutes to, for example, one month. The system stores

mean values for each variable used for detecting exceeds that of a

predefined threshold. Yet, this simple approach was unable to match a

typical user behaviour model. Approaches that relied on matching

individual user profiles with aggregated group variables also failed to

be efficient. Therefore, a more sophisticated model of user behaviour

has been developed using short- and long-term user profiles. These

profiles are regularly updated to keep up with the changes in user

behaviours. Statistical methods are often used in implementations of

normal user behaviour profile-based Intrusion Detection Systems.

2.4.6 Neural Networks

Neural networks use their learning algorithms to learn about the

relationship between input and output vectors and to generalize them

to extract new input/output relationships. With the neural network

approach to intrusion detection, the main purpose is to learn the

69

behaviour of actors in the system (e.g., users, daemons).It is known

that statistical methods partially equate neural networks. The

advantage of using neural networks over statistics resides in having a

simple way to express nonlinear relationships between variables, and

in learning about relationships automatically. Experiments were carried

out with neural network prediction of user behaviours. From the results

it has been found that the behaviour of UNIX super-users (roots) is

predictable (because of very regular functioning of automatic system

processes) [73]. With few exceptions, behaviour of most other users is

also predictable. Neural networks are still a computationally intensive

technique, and are not widely used in the intrusion detection

community.

2.4.7 User Intention Identification

This technique models normal behaviour of users by the set of

high level tasks they have to perform on the system (in relation to the

Users’ functions). These tasks are taken as series of actions,

which in turn are matched to the appropriate audit data. The analyzer

keeps a set of tasks that are acceptable for each user. Whenever a

mismatch is encountered, an alarm is produced.

70

2.4.8 Computer Immunology

Analogies with immunology have lead to the development of a

technique that constructs a model of normal behaviour of UNIX

network services, rather than that of individual users. This model

consists of short sequences of system calls made by the processes.

Attacks that exploit flaws in the application code are very likely to take

unusual execution paths. First, a set of reference audit data is collected

which represents the appropriate behaviour of services, and then the

knowledge base is added with all the known “good” sequences of

system calls. These patterns are then used for continuous monitoring

of system calls to check whether the sequence generated is listed in

the knowledge base; if not an alarm is generated. This technique has a

potentially very low false alarm rate provided that the knowledge base

is fairly complete. Its drawback is the inability to detect errors in the

configuration of network services. Whenever an attacker uses

legitimate actions on the system to gain unauthorized access, no alarm

is generated.

2.5 Data mining Theoretical background

Data mining [71] is the process of automatically scanning huge

amount of data and searching available patterns in it. Storing large

amount of data is useful only when we extract useful information from

71

it. Data mining deals with large volume of data to extract meaningful

information. Data mining refers to extracting or mining knowledge from

large amounts of data [82]. In data mining, algorithms seek out

patterns and rules within the data from which sets of rules are derived.

Algorithms can automatically classify the data based on similarities

(rules and patterns) obtained between the training and the testing data

set.

Data mining [27] is the process of discovering patterns in data,

either automatically or semi-automatically. The patterns discovered

must be meaningful in that they lead to some advantage, usually

financial advantages. Data mining combines concepts, algorithms and

tools. It has derived concept from machine learning and statistics for

the analysis of very large datasets. Data mining gain insights,

understanding of data and provides actionable knowledge. Data mining

provides capability to predict the outcome of a future observation.

Other than predicting future observation, data mining is also useful for

summarizing the underlying relationship in data.

Data mining can mine data from different data storage like text

data, databases, data warehouse, transactional data, multimedia data,

stream, spatiotemporal, time-series, sequence, and web, multi-media,

graphs & social and information networks etc. The field of data mining

72

grew out of the limitations of current data analysis techniques in

handling challenges posed by these new types of datasets.

Today, data mining has grown so vast that they can be used in

many areas like financial analysis, customer management, and risk

management, predicting costs of corporate expense claims,

healthcare, insurance, process control in manufacturing and in other

fields. This thesis illustrates how data mining is also applicable in

computer security management.

Data mining analyzes data from different perspective and

summarizes it into useful information. It also analyzes data from many

different dimensions, and then it categorizes and summarizes the

relationships identified. Technically, data mining is the process of

finding correlations or patterns among various fields in large datasets.

The current developments in data mining contributed a wide variety of

algorithms, drawn from the fields of statistics, pattern recognition,

machine learning, and database which is useful for technology

adaptation and usage.

Data mining is able to predict important things in advance. That

technique that is used to perform these feats is called modelling.

Modelling is simply the act of building a model. A model is a set of

rules, examples or mathematical relationships. Model is built on data

73

from situations where the outcome is known and then this model is

applied to other situations where the outcome is not known. Modelling

techniques have been around for centuries, but techniques of huge

data storage, data communication capabilities and ability to process

complex data is recently developed, so modelling is applicable to new

areas.

As a simple example of building a data mining model [27],

consider the director of educational institute. He/she would like to focus

results and educational quality of his institute. Large amount of student

data is usually available at all the institutes. He knows a lot about his

students, but it is impossible to discern the common characteristics of

his students. From the existing database of students, which contains

information such as age, sex, academic history, continuous

assessment details, family background etc., he can use data mining

tools for discovering useful patterns such as relation between student’s

previous academic performance with entrance examination score,

continuous assessment data with their final examination results, or

predicting about failure cases, the placement package received by a

student, establishing association between two elective subjects

registered by a student in a semester, number of international students

admitting to the institute. Data mining will be very helpful for such

74

analysis of the large amount of data, which in turn will help for

academic performance improvements, planning, promotional activities

etc .Data mining [27] is primarily used today by companies to acquire

information about their customers .data mining also enables these

companies to determine relationships among "internal" factors such as

price, product positioning, or staff skills, and" external" factors such as

economic indicators, competition, and customer demographics.

2.5.1. Data mining and Knowledge discovery

Data Mining is a step in KDD [102] process which uses specific

algorithms for extracting patterns (models) from data. The term KDD

refers to the overall process of discovering useful knowledge from

data. The KDD process has other steps like data preparation, data

selection, data cleaning etc. At first, data is obtained from various data

sources, then data pre-processing like data cleaning and data

integration is applied. This creates data warehouse. From data

warehouse task relevant data is taken and data mining is applied on

this. Data mining applies pattern evaluation to extract knowledge.

Therefore, Data mining plays an essential role in the knowledge

discovery process.

The KDD process refers to the whole process of changing low

level data into high level knowledge which is automated or semi-

75

automated discovery of patterns and relationships in huge databases

and data mining is one of the core steps in the KDD process.

Knowledge discovery is the process of automatically generating

information formalized in a form ‘understandable’ to humans. To bridge

the gap of analysing large volume of data and extracting valuable

information and knowledge for decision making using new

computerization technologies, DM and KDD has emerged since recent

years.

According to U. Fayyad [143] KDD will continues to evolve, from

the intersection of research in various fields like artificial intelligence ,

databases, machine learning, pattern recognition, statistics, knowledge

acquisition for expert systems, data visualization, high-performance

computing, machine discovery, scientific discovery and information

retrieval. KDD software systems incorporate theories, algorithms, and

methods from all of these fields.

Although, the two terms KDD and DM are closely related, yet

they refer to slightly different two concepts. Data mining is only the

application of a specific algorithm based on the overall goal of the KDD

process. The knowledge discovery stage then extracts the knowledge

which must then be post processed to facilitate human

76

Understanding. Post-processing usually takes the form of

representing the discovered knowledge in a user friendly display.

Figure 2.3 KDD process model

Data mining can mine data from different data storage [71][76]

like text data, databases, data warehouse, transactional data,

multimedia data, stream, spatiotemporal, time-series, sequence, and

web, multi-media, graphs & social and information networks etc. The

field of data mining grew out of the limitations of current data analysis

Knowledge

Pattern evaluation

Data Mining

Data Selection

Data pre-processing

77

techniques in handling challenges posed by these new types of

datasets.

2.5.2. History of data mining.

The term "Data mining" was introduced in the 1990s, but data

mining is the progress of a field with a long history [17]. Data mining

roots are traced back along three family lines: statistics, artificial

intelligence [85], and machine learning [80] which is shown in Figure

2.4.

Figure 2.4 Data Mining and Associated Fields

Statistics is the foundation of many technologies on which data

mining is built, e.g. regression analysis, standard distribution, standard

deviation, standard variance, discriminate analysis, cluster analysis,

and confidence intervals. All of these are used to study data and data

relationships.

78

Artificial intelligence (AI), which is built upon heuristics as

contrasting to statistics, it tries to apply human-thought-like processing

to statistical problems. Certain AI concepts which were adopted by

some high-end commercial products, such as query optimization

modules for Relational Database Management Systems.

Machine learning (ML) [141] is the combination of statistics and

AI. It could be considered an evolution of AI, because it blends AI

heuristics with advanced statistical analysis. Machine learning attempts

to let computer programmes learn about the data they study, such that

programmes make different decisions based on the qualities of the

studied data, using statistics for fundamental concepts, and adding

more advanced AI heuristics and algorithms to achieve its goals.

Data mining is adaptation of machine learning techniques to

business applications. Data mining is best described as the union of

historical and recent developments in statistics, AI, and ML. These

techniques are then used together to study data and find patterns,

rules and hidden trends .In preliminary days, data mining algorithms

mainly developed for numerical data but it further extended for all types

of data like text, web, picture, multimedia spatial etc. as data mining

began with analysis of single data base, but data mining techniques

have evolved for flat files, traditional and relational databases and data

79

warehouse. Later on, with the confluence of Statistics and Machine

Learning techniques, various algorithms evolved to mine structured

and unstructured data.

The field of data mining [147] has been greatly influenced by the

development of fourth generation programming languages and various

related computing techniques. In early days of data mining, most of the

algorithms employed only statistical techniques. Later on, they evolved

with various computing techniques like AI, ML and Pattern

Reorganization. Various data mining techniques (Induction,

Compression and Approximation) and algorithms developed to mine

the large volumes of heterogeneous data stored in the data

warehouses. The field of data mining has been growing due to its

enormous success in terms of scientific progress and broad-ranging

application achievements and, understanding. Various data mining

applications have been successfully implemented in various domains

like financial analysis, customer management, health care, retail,

telecommunication, fraud detection and risk analysis etc. The ever

increasing complexities in various fields and improvements in

technology have posed new challenges to data mining; the various

challenges include different data formats, data from disparate

80

locations, advances in computation and networking resources,

research and scientific fields, ever growing business challenges etc.

2.5.3. Data mining functionality

Data mining is extraction of interesting patterns or knowledge

from huge amount of data. For extraction of patterns various

functionalities are available. Data mining searches for non-trivial and

implicit patterns from data. These patterns are mostly previously

unknown but potentially useful. Data mining offers various types of

functionalities, specific functionality is selected depending on the

application area and kind of knowledge to be mined. Using these

functionalities different type of knowledge can be mined like

association rule, classification rule, discriminate rule and deviation

analysis etc. Data mining functionalities [104] are extensive and rich; it

can serve various fields and applications.

Figure 2.5 shows basic functionalities like classification,

clustering, frequent pattern mining, outlier analysis etc. these

functionalities are explained below.

81

Figure 2.5 Data mining functionalities

• Characterization and Discrimination

Data characterization [147] is a summarization of the general

characteristics or features of a target class of data. In data

characterization, based on user’s specific requirement summarization

is done. The data is usually collected by a query. In data discrimination

the target class data objects is compared with the objects from one or

multiple contrasting classes with respect to specified generalized

features[31][39].

82

• Mining frequent patterns

Frequent patterns [80] are the patterns that occur frequently in

the data. Patterns can include item sets, sequences and sub

sequences. A frequent item set refers to a set of items that often

appear together in a transactional data set.

Given a collection of items and a set of records, each of which

contain some number of items from the given collection, an association

function is an operation against this set of records which return,

affinities or patterns that exist among the collection of items. These

patterns can be expressed by rules such as "80% of all the records that

contain items A, B and C also contain items D and E." The specific

percentage of occurrences (in this case 80) is called the confidence

factor of the rule .Also, in this rule, A,B and C are said to be on an

opposite side of the rule to D and E. Associations can involve any

number of items on either side of the rule.

• Classification and prediction

Classification [71] techniques in data mining are capable of processing

a large amount of data. Classification assigns items in a data set to

target categories or classes. Classification correctly predicts the target

class for each case in the data.

83

Classification consists of assigning a class label to a set of

unclassified cases. Because the class label of each training tupple is

provided, this step is also known as supervised learning also.

Classification techniques infer a model from the database. The

database contains many attributes that denote the class of a tupple

and these are known as predicted attributes whereas the remaining

attributes are called predicting attributes. A combination of values for

the predicted attributes defines a class.

When learning classification rules, the system has to find the

rules that predict the class from the predicting attributes, so firstly the

user has to define conditions for each class; the data mine system then

constructs descriptions for the classes. Basically, the system should

give a case or tupple with certain known attribute values be able to

predict what class this case belongs to.

Once classes are defined the system should infer rules that

govern the classification therefore the system should be able to find the

description of each class. The descriptions should only refer to the

predicting attributes of the training set so that the positive examples

should satisfy the description and none of the negative. A rule said to

be correct, if its description covers all the positive examples none of

the negative examples of a class.

84

There are various data mining classification techniques like

Decision Tree based Methods, Rule-based methods, Naïve Bays and

Bayesian Belief Networks, Nearest Neighbour Method Neural Network,

Support Vector Machines [61], Ensemble Methods usable for

classification and prediction. Figure 2.6 shows classification using

decision tree.

Figure 2.6 Classification using decision tree

• Clustering

Clustering [117] and segmentation are the processes of creating

a partition so that all the members of each set of the partition are

similar according to some metric. Clustering method belongs to

unsupervised technique. In unsupervised technique classes or

categories are not predefined. In this a set of objects grouped together

85

because of their similarity or proximity. When learning is unsupervised,

the system has to discover its own classes i.e. the system clusters the

data in the database. The system has to discover subsets of related

objects in the training set and then it has to find descriptions that

describe each of these subsets.

Objects are often decomposed into an exhaustive and/or

mutually exclusive set of clusters. Clustering [71] according to similarity

is a very powerful technique, the key to it being to translate some

intuitive measure of similarity into a quantitative measure. There are a

number of approaches for forming clusters. One approach is to form

rules which dictate membership in the same group based on the level

of similarity between members. Another approach is to build set

functions that measure some property of partitions as functions of

some parameter of the partition. Figure 2.7 shows clustering data

mining functionality.

Figure 2.7 Clustering

86

• Outlier analysis

Outliers [71] are data objects that do not comply with the general

behaviour or model of data. Outliers (if present in dataset) are

discarded before processing through other data mining functionalities.

outliers usually represents exceptions or noise. Figure 2.8 shows

outlier analysis, R represent data which is outlier from rest of data.

Figure 2.8Outlier Analysis.

Data mining functionalities covers wide range of applications

however there is need of new functionalities. Data mining research can

provide new functionalities which can serve many application areas

efficiently. Research in data mining has multiple aspects, if handled

properly works effectively.

87

2.6 Evaluation of Datasets

Most intrusion detection techniques beyond basic pattern

matching require sets of data to train on. When work on advanced

network intrusion detection systems began in earnest in the late

1990’s, researchers quickly recognized the need for standardized

datasets to perform this training.

Such datasets allow different systems to be quantitatively

compared. Further, they provide a welcome alternative to the prior

method of dataset creation, which involved every researcher collecting

data from a live network and using human analysts to thoroughly

analyze and label the data. The first such widely cited dataset was for

the Information Exploration Shootout (IES), which unfortunately, is no

longer available. It was used to test the anomaly-detection

performance of those systems. It consisted of four collections of tcp

dump data: one that contained purely normal data, and three

consisting of normal data with injected attacks.

The data was apparently captured from a real network, and

consists of only the packet headers in order to protect the privacy of

the user’s .In one of the early papers from Lee and Stolfo [92], they

noted the anticipated arrival of a new dataset from the Air Force’s

Research Laboratory (AFRL) in Rome, NY.

88

The AFRL, along with MIT’s Lincoln Lab, collected network traffic

from their network and used it as the basis for a simulated network

.Using a simulated network allowed them to carefully control if and

when attacks were injected into the dataset. Furthermore, it allowed

them to collect the entire packet without needing to protect user

privacy. Details on the simulated networks and injected attacks are

available in Kendall [84]. They used the simulated network to create a

couple weeks of intrusion-free data, followed by a few weeks of data

labelled with intrusions. This data was made available to researcher’s

in1998 as the DARPA Off-line Intrusion Detection Evaluation.

Participants were then given two weeks of unlabelled data, including

previously unseen attacks, and asked to label the attacks. Lippmann

presented the results in [95].

Numerous researchers have used this data to test their systems,

both as part of the DARPA evaluation, as well as independently. In

response to the 1998 challenge, McHugh wrote a rather scathing

critique of the evaluation. While he presents many good points on how

an evaluation of IDSs should be performed, he also criticizes

numerous shortcomings in the challenge without acknowledging how

difficult addressing some of the issues is.

89

For example, he notes that the generated data was not validated

against real traffic to ensure that it had the same rates of malicious

traffic versus non-malicious anomalous traffic that caused false

positives. Doing so would, however, require more insight into real

traffic than we can possibly obtain (in particular, intent), further,

modelling of traffic at that scale is still an area with much research left

to be done.

Some of his more directly applicable feedback was used for the

IDS challenge the following year. In particular, Das [30] outlines the

improvements that were made in the test bed and injected attacks, and

provides details on the addition of Windows NT hosts and attacks in

the1999 evaluation.

While McHugh’s critique was based primarily on the procedures

used to generate the DARPA data, Mahoney [25] provides a critique

based on an analysis of the data compared to real world data captured

on their network. They note that many of the attributes that are well

behaved in the DARPA dataset are not in real world data. They found

that by mixing their real-world data with the DARPA data, they were

able to increase the number of legitimate detections (detections that

were not an artefact of the data generation process), using five simple

statistically based anomaly detectors.

90

While this approach is an excellent stop-gap measure to achieve

a more realistic performance measure using the DARPA data, it is not

suitable for all research for two reasons:

i. It requires the addition of attack-free (or at least accurately

labeled) real world data, which no one is willing to share to use

as a standard,

ii. It requires that the method not differentiate between the DARPA

data and the real-world data, which might be controllable for

some methods (particularly those that produce human readable

rules), but not for others (such as artificial neural networks and

hidden Markov models).

To address the first point, Mahoney [30] analyzed their real world

data with Snort, however they don’t address the possibility of the data

containing new or stealthy attacks that Snort is incapable of

detecting(and which drive the development of more advanced intrusion

detection techniques).

Lee did a great deal of analysis using the DARPA data, and

identified 41 features of interest to a data mining based network IDS.

He provided a copy of the DARPA data that was already pre-

processed, by extracting these 41 features, for the 1999 KDD Cup

contest, held at the Fifth ACM International Conference on Knowledge

91

Discovery and Data Mining. Since this version of the dataset already

has the tedious and time-consuming pre-processing step done, it has

been used as the basis for most of the recent research on data mining

IDSs.

There are a couple of other datasets that are used occasionally.

The first is the Internet Traffic Archive from Lawrence Berkeley

National Laboratory [67].

It consists of a collection of tcpdump data captures from a live

network on the Internet. It has been used by [92], primarily to show that

data mining methods are sensitive to traffic patterns, such as the

difference in traffic between working hours and overnight. Another

dataset is Security Suite 16, which was created by InfoWorld to test

commercial network intrusion detection systems [101].

As an alternative to using datasets, such as those described

above, skin [42] presents an intrusion detection approach that does not

require training data. Rather, it separates the normal data and the

noisy data (anomalies) into two separate sets using a mixture model.

This model can then be applied for anomaly detection. The technique

can also be applied to a dataset that has been manually labeled, in

order to detect marking errors [42].

92

Of all the datasets presented here, the DARPA/KDD dataset

appears to be the most useful as a dataset that can be used without

any further processing. Unfortunately, given the criticisms against this

data, we recommend that any further research in this area use both the

DARPA datasets and one of the DARPA datasets mixed with real-

world data. Doing so and being able to compare and contrast the

results should help alleviate most of the criticism against work based

solely on the DARPA data, and still allow work to be directly compared.

Ideally, someone using a mixed dataset will make their real-world data

available for everyone to use. This approach will necessitate the

regeneration of connection records as the KDD Cup data only

processed the 1998 DARPA data and obviously doesn’t include any

new data that may be mixed in. Finally, a couple observations on

dataset utilization:

First, the typical approach to using datasets is to have some

normal (intrusion-free) data and or data with labeled intrusions, which

is used to train the data mining methods being applied. None of the

literature, however, explicitly discusses the use of separate training

sets for classifiers and the classifiers they incorporate. It would

probably be useful to train the classifier using attacks the classifiers

have not already seen, such that the classifier can give proper weight

93

to classifiers that do a good job of detecting previously unseen

techniques.

Second, we have noticed a disturbing trend in some published

research to modify a standard dataset because the researchers do not

believe it accurately models real Internet traffic, for instance they

believe that it has too many or too few attacks. This is unfortunate as it

precludes a qualitative comparison of their research to other work.

Further, as a community we lack any solid statistics on traffic

characteristics in different environments, hence the use of modified

data implies that the given technique isn’t robust enough to perform

well on different or dynamic networks.

2.7 Feature Selection

The most popular data format to do analysis on is the connection

log. Besides log formats (such as packet logs), the connection record

format affords more power in the data analysis step, as it provides

multiple fields that correlation can be done on (unlike a format such as

command histories).

Additionally, not examining data stream contents saves

significant amounts of processing time and storage, and avoids privacy

issues. While some have argued that not looking at the data stream will

prevent the detection of user to root (U2R) attacks, and that some of

94

these attacks will be detected as attackers will modify the network

stream precisely to avoid IDS detection, as described in [117]. Lee and

Neri [92] found that converting the network data to connection logs

aided performance with their data mining techniques.

In the event that connection logs are built based on packet

information, certain features, such as the state of the connection

establishment and tear down, over-lapping fragments, and resend rate

will need to be calculated [92].

Connection records provide numerous features that are intrinsic

to each connection. Lee noted [60] that the timestamp, source address

and port, destination address and port, and protocol uniquely identify a

connection, making them essential attributes.

They go on to note that “association rules should describe

patterns related to the essential attributes.” Specifically, at least one of

those attributes must be present in the antecedent of a rule in order for

that rule to be useful. They call this the axis attribute for the rule. For

example, a rule that is based solely on the number of bytes transferred

really does not convey any useful information.

Likewise, if the value of some feature must be kept constant

through the processing of a set of records (for instance, the destination

95

host), that feature is called a reference attribute [92]. Other researchers

also had success with this approach. Dickerson [60] found that their

best results were achieved when they limited their rules to only use a

key consisting of the source IP, destination IP, and the destination port.

Hofmeyr and Forrest [13] used the same approach, although

they assign all connections with unassigned privileged ports to one

service group, and all connections with unassigned non-privileged

ports to another group.

Essential attributes provide vital information about connections,

most research uses some of the secondary attributes, such as

connection duration, TCP flags and the volume of data passed in each

direction. Some researchers, such as Dickerson and Dickerson, also

treat the essential attributes that they do not key of, such as timestamp

and source port as secondary attributes. Perhaps the most interesting

data point in the work of Singh and Kandula [95], who did not report the

use of any essential attributes, despite their work being based heavily

on that of Lee [144]. This may account for their poor performance,

while greater care in choosing connection features.

Unfortunately, the intrinsic attributes of a connection are

insufficient to provide adequate detector performance against most

attacks. Temporal information with each data point significantly

96

increased accuracy. Temporal information is captured in the form of

calculated attributes. A calculated attribute provides the average value

of an attribute, or the count or percentage of connections fulfilling some

criteria over the last w seconds, or n connections.

For example, Lee [92] included a count of how many packets in

the last w seconds had the same value for an attribute as the current

connection to the same service or destination host. It is formalized the

notion of defining calculated features as functions of the other features,

using a predefined set of operators such as count, percentage, or

average, as well as a set of data constraints such as same host, same

service, different host, or time window.

Honig [34] extended this approach by allowing the analyst to

dynamically create new features using these functions. A new column

is automatically added to the table to store the new feature in the

database.

In Lee [92] explained that the decision to count the occurrence of

a given attribute’s value is made when many frequent episode rules

are generated that include the given feature with a constant value.

Likewise, they generate an average value for an attribute if that

attribute is seen repeated in many frequent episode rules with different

values.

97

There are numerous techniques to identify which of the

secondary or calculated attributes provide the best feature set for a

given method. Frank [108] used backwards sequential search, beam

search, and random generation plus sequential selection.

Lee and Xiang [92] do an excellent job of applying information

theoretic measurement techniques to feature sets in order to evaluate

the relative utility of different sets (based on some earlier work by Lee.

The measures they use are entropy, conditional entropy, relative

conditional entropy, information gain, and information cost. The

concept of time is particularly problematic for IDSs to handle, both in

terms of correlating events over time, and behaviour that changes over

time.

The ability to correlate events over time is useful, particularly for

identifying regular activity, such as an automated process that transfers

Files at a specific time every night. Such behaviour may either be

expected or safely ignored, or it may indicate activity worth

investigation as it maybe from a Trojan horse or other form of malicious

code.

To address this problem, Li [22] developed the notion of a

calendar schema. These calendar schemas build temporal profiles,

which allow the mined induction rules to use multiple time granularities.

98

The other problem that time presents is that the behaviour of monitored

networks will change over time. Because of this, the profiles that

characterize the network will need to incorporate new behaviour and

age out old behaviour.

The manner in which this is accomplished is necessarily tied to

the underlying data model. For instance, in IDES and NIDES they

accomplished this by periodically updating their statistical models by

multiplying in an exponential decay factor when adding in the currently

observed values for an attribute [69].

For the inductive rules used by Lee [22], a new rule set was

created for each day’s data, and then merged with the existing rule set.

By keeping track of how often a rule appeared in a daily rule set, and

when a rule last appeared, they could ascertain the relevance of a rule

and age out old rules. Some of the techniques described below,

particularly those that rely on a mapping between the network

connection records and a geometric space (hyper-plane), only produce

optimal, or even usable, results if the features in the records are first

normalized. This is typically accomplished by scaling continuous

values to a given range, possibly scaling the values with a logarithmic

scale to avoid having large values(typically seen in distributions of

attributes of long-tailed network data)dominate smaller values [152].

99

Discrete vales are typically mapped to their own features,

coordinates that are equidistant from one another or represented

based on their frequency [23]. A similar problem is presented by zeros

in the dataset, as features with an observed value of zero may either

actually be zero, or they may be zero due to a lack of observations. To

address that problem Barbar [13] applied pseudo Bayes estimators to

refine the zero values in their training data.

Chan [23] address the same problem in association rules by

using a probability of novel events based on the frequency of rules

supporting the antecedent in the training set. They also looked at

Laplace smoothing, however found it was inappropriate as it required

the alphabet sizes and distributions to be known at training time.

Another technique that can be applied to the dataset to improve

accuracy is compression. Neri [13] found that compressing features, by

representing many discrete values with a single value is, “a valuable

way of increasing classification performances without introducing

complex features that may involve additional processing overhead.”

Barbar´a [13] applied feature compression by grouping together

connections that come from the same domain (subnet) in order to

detect activity coming from a highly coordinated group of hosts.

The information-theoretic work by Lee and Xiang [92] explains

that substituting a single record to represent a group of records (such

100

as all those in the past w seconds for a given service), significantly

increases the information gain (which should subsequently improve the

accuracy of detection methods on that data).Singh and Kandula [131]

note that the features they chose were based purely on heuristics and

that, “It would be really useful if the choice of these features could be

automated.”

Helmer [60] did exactly that with system call data using the “bag

of words” technique, where every call was represented by a bit in a

vector labelled as normal or intrusive. They then fed these vectors to a

genetic algorithm and found that the set of necessary features was

about half of the full set of available features. Using the pruned set

resulted incomparable detection accuracy and reduced the false

positive rate to zero.

2.8 Summary

In this chapter Literature reviews based on previous works are

discussed. Classification of intrusion detection systems, Types of

Protected Systems, IDS Data Processing Techniques, Data mining and

Knowledge discovery, Evaluation of Datasets and Feature Selection

are also discussed. Advantages and limitations for the previous works

were also being discussed.

101

CHAPTER 3

METHODOLOGY & DATABASE

3.1 The DARPA Intrusion-Detection Evaluation Program

The number of intrusions is to be found in any computer and

network audit data are plentiful as well as ever-changing. They are also

thoroughly scattered and attempts to structure or catalogue audit data

are extremely effort-intensive. In order to create effective detection

models, model-building algorithms typically require a large amount of

labelled data. One major difficulty in deploying IDS is the need to label

system audit data for the algorithms. Misuse-detection systems need

the data to be accurately labelled as either ‘normal’ or ‘attack’, whereas

for anomaly-detection systems, the data must be verified to ensure that

it is exclusively ‘normal’ namely attack-free. This requires the same

effort (40, 90) and preparation of the data in this manner is both time-

consuming and costly.

A generous sponsor for the production of intrusion-detection

audit data was found in the US government agency DARPA (Defence

Advanced Research Project Agency, US) an innovator and promoter of

technology, this organization has funded many projects in the last few

decades. In 1969, one such research and development project was

102

sub sidized ‘to create an experimental packet-switched network’. This

one venture saw the modest beginnings of what grew into the

omnipresent Internet, known today. As a matter of fact, DARPA

supports the evaluation of developing technologies: focusing on an

effort, documenting existing capabilities and guiding research.

The 1998 DARPA Off-line Intrusion-Detection Evaluation

Program [94, 103, and 75] was one such project. Aware of the lack of

suitable audit data sets for intrusion detection, DARPA sets out (1) to

generate an intrusion-detection evaluation corpus which could be

shared by many researchers, (2) to evaluate many intrusion-detection

systems, (3) to include a wide variety of attacks and (4) to measure

both attack-detection rates and false-alarm rates for realistic normal

traffic. To avoid publicizing confidential information concerning any real

network in connection with the data and in order not to cause

disruption in the operation of an on-line network, an extensive test bed

has been set up at MIT’s Lincoln Laboratories for synthesis purpose.

This test bed simulated the operation of a typical US Air Force LAN for

over two months allowing considerable amount of audit data to be

collected from it.

103

3.2 Attack Types in the 1999 DARPA Data Set Each attack type falls into one of the four following main categories:

• Denial-of-service (DOS)

DOS attacks have the goal of limiting or denying service(s)

provided to a user, computer or network. A common tactic is to

severely overload the targeted system like a SYN flood.

• Probing or surveillance

Probing or surveillance attacks have the goal of gaining

knowledge of the existence or configuration of a computer system or

network. Port scans or sweeping of a given IP address range is

typically used in this category like IPs weep.

• Remote-to-Local (R2L)

R2L attacks have the goal of gaining local access to a computer

or network to which the attacker only previously had remote access.

Examples of this are attempts to gain control of a user account say the

Dictionary.

• User-to-Root (U2R)

U2R attacks have the goal of gaining root or super-user access

on a particular computer or system with which the attacker previously

104

had user level access. These are the attempts by a non-privileged user

to gain administrative privileges (e.g. Eject). A total of 24 attack types

was included in the training data and further 14 novel attacks were

added to the test data, to compare the performance of IDS on ‘known’

and on ‘yet-unseen’ attacks .A further aim of the evaluation was to

determine whether systems could detect stealthy attacks. These are

variations of an attack. They have been modified from the standard

form available on the Internet, in an attempt to evade detection.

Methods of being stealthy vary, depending on the attack type [84]. The

attacks are grouped according to a category and type. The number of

occurrences is detailed; distinguishing between attacks launched in the

clear or performed stealthily. Furthermore, specifying whether it is

appeared in training or test data. For example, there were 46 Eject

attacks in the simulation. Of these, 10 were stealthy and 36 were

performed in the clear. Of those in the clear category, 29 figured in the

training data and 7 in the test data. In the DARPA programmes,

detection rates for each attack category were estimated for

comparative purposes, when evaluating the performance of IDS.

3.2.1 Different Attack Types

The category of an attack is determined by its ultimate goal, so

that within a given category, attacks may closely resemble each other.

105

The DOS attacks are designed to disrupt a host or network service.

Some DOS attacks (e.g. smurf) excessively load a legitimate network

service; others (e.g.teardrop, Ping of Death) create malformed packets,

which are incorrectly handled by the victim machine. Others still (e.g.

apache2, back, syslogd) take advantage of software bugs in network

daemon programmes. Probe attacks are launched by programmes,

which can automatically scan a network of computers to gather

information or find known vulnerabilities. Such probes are often

precursors to more dangerous attacks because they provide mapping

to machines and services and pinpoint weak links in a network. Some

of these scanning tools, satans, saint and mscan enable even an

unskilled attacker to check hundreds of machines on a network for

known vulnerabilities.

In the R2L attacks, an attacker who does not have an account

on a victim machine sends packets to that machine and gains local

access. Some R2L attacks exploit buffer overflows in network server

software (e.g. imap, named, sendmail); others exploit weak or

misconfigured security policies (e.g. dictionary, ftp-write, and guest)

and one (xsnoop) is a Trojan password capture programme. The

snmp-get R2L attack against the router is a password-guessing attack

where the community password of the router is guessed and an

106

attacker then uses SNMP to monitor the router. During U2R attacks, a

local user on a machine tries to obtain privileges normally reserved for

the UNIX root or super-user. Some U2R attacks exploit poorly-written

system programmes which run at root level and are susceptible to

buffer overflows (e.g. eject, ffbconfig, fdformat). Others may exploit

weaknesses in path-name verification (e.g. loadmodule), bugs in some

versions of perl (e.g. suidperl) or other software weaknesses.

3.2.2 Attack Descriptions

back - Denial-of-service attack against apache webserver,

where a client requests a URL containing many

backslashes.

dict - Guess passwords for a valid user, using simple

variants of the account name over a telnet connection.

eject - Buffer overflow using eject program on Solaris. Leads

to a user to-root transition if successful.

ffb - Buffer overflow using the ffbconfig UNIX system

command leads to root shell.

format - Buffer overflow using the fdformat UNIX system

command leads to root shell.

107

ftp - Write - Remote FTP user creates .rhost file in world

writable anonymous FTP directory and obtains local

login.

guest - Try to guess password via telnet for guest account.

ipsweep - Surveillance sweep performing either a port sweep or

ping on multiple host addresses.

land - Denial of service where a remote host is sent a UDP

packet with the same source and destination.

loadmodule - Non-stealthy load module attack which resets IFS for a

normal user and creates a root shell.

multihop - Multi-day scenario in which a user first breaks into one

machine.

neptune - Syn-flood denial-of-service on one or more ports.

nmap - Network mapping using the nmap tool. Mode of

exploring network will vary-options include SYN.

perlmagic - Perl attack which sets the user id to root in a perl script

and creates a root shell.

phf - Exploitable CGI script which allows a client to execute

arbitrary commands on a machine with a misconfigured

web server.

pod - Denial-of-service ping-of-death.

108

portsweep - Surveillance sweep through many ports to determine

which services are supported on a single host.

rootkit - Multi-day scenario where a user installs one or more

components of a rootkit.

satan - Network probing tool which looks for well-known

weaknesses. operates at three different levels. Level 0

is light.

smurf - Denial-of-service icmp-echo reply flood.

spy - Multi-day scenario in which a user breaks into a

machine with the purpose of finding important

information where the user tries to avoid detection.

Uses several different exploit methods to gain access.

syslog - Denial of service for the syslog service connects to port

514 with unresolvable source ip.

teardrop - Denial of service where mis-fragmented UDP packets

cause some systems to reboot.

warez - User logs into anonymous FTP site and creates a

hidden directory.

warezclient - Users downloading illegal software which was

previously posted via anonymous FTP by the warez

master.

109

warezmaster- Anonymous FTP uploads of Warez (usually illegal

copies of copyrighted software) onto FTP server.

3.3Data-Set Description

This is the data set used for The Third International Knowledge

Discovery and Data Mining Tools Competition, which was held in

conjunction with KDD-99 the Fifth International Conference on

Knowledge Discovery and Data Mining. The competition task was to

build a network intrusion detector, a predictive model capable of

distinguishing between ``bad'' connections, called intrusions or attacks,

and ``good'' normal connections. This database contains a standard

set of data to be audited, which includes a wide variety of intrusions

simulated in a military network environment.

The ‘KDDCUP99 Data’ [66] are the data sets, which were issued

for use in the KDDCUP ’99 Classifier-Learning Competition. These

sets of training and test data were made available [137, 91] and

consisted of a pre-processed version of the 1998 DARPA Evaluation

Data. This team’s IDS had performed particularly well in the Intrusion-

Detection Evaluation Program of that year, using data mining even as

a ‘pre-processing’ stage to extract characteristic intrusion features from

raw TCP/IP audit data. The original raw training data were about four

110

gigabytes of compressed binary tcpdump data obtained from the first

seven weeks of network traffic at MIT. This was pre-processed with the

feature-construction framework MADAM ID (Mining Audit data for

automated models for Intrusion Detection) to produce about five-million

connection records. A connection is defined to be a sequence of TCP

packets starting and ending at some well-defined times, between which

data flow to and fro from a source IP address to a destination IP

address, under some well-defined protocol. Each connection is labelled

as either ‘normal’ or with the name of its specific attack type. A

connection record consists of about 100 bytes. Ten percent of the

complementary two-weeks of the test data were, likewise, pre-

processed to yield a further less than half-a million connection records.

For the information of contestants, it was stressed that these test data

were not from the same probability distribution as the training data, and

that they included specific attack types which are not found in the

training data. The full amount of labelled test data with some two

million records was not included in this data set.

3.3.1Set of Features used in the Connection Records

In the KDDCUP99 Data, the initial features extracted for a

connection record [41, 89] include the basic features of an individual

TCP connection, such as: its duration, protocol type, number of bytes

111

transferred and the flag indicating the normal or error status of the

connection. These ‘intrinsic’ features provide information for general

network-traffic analysis purposes. Since most DOS and Probe attacks

involve sending a lot of connections to the same host(s) at the same

time, they can have frequent sequential patterns, which are different to

the normal traffic. For these patterns, a “same host” feature examines

all other connections in the previous 2 seconds, which had the same

destination as the current connection. Similarly, a “same service”

feature examines all other connections in the previous 2 seconds,

which had the same service as the current connection. These temporal

and statistical characteristics are referred to as the “time based” traffic

features. There are several Probe attacks which use a much longer

interval than 2 seconds (for example, one minute) when scanning the

hosts or ports. For these, a mirror set of “host-based” traffic features

were constructed based on a ‘connection window’ of 100 connections:

The R2L and U2R attacks are embedded in the data portions of the

TCP packets and it may involve only a single connection. To detect

these, ‘connection’ features individual connections were constructed

using domain knowledge. These features suggest whether the data

contains suspicious behaviour, such as: a number of failed logins

successfully logged in or not, whether logged in as root, whether a root

shell is obtained, etc. In total, there are 42 features (including the

112

attack type) in each connection record, with most of them taking on

continuous values. The individual features are listed and briefly

described in Table 3.2 to 3.5. Table 2.1 shows the different types of

attacks and their categories:

Table 3.1 Class Labels that Appears in Full KDDCUP99” Dataset

Category KDD Cup 99FULL Dataset

After Removing Duplicate Samples

% rate of Reduction

Dataset Class

Normal 972781 812814 16.44 NORMAL Back 2203 968 56.06 DOS Pod 264 206 21.97 DOS Land 21 19 9.52 DOS Smurf 2807886 3007 99.89 DOS Teardrop 979 918 6.23 DOS Neptune 1072017 242149 77.41 DOS Nmap 2316 1554 32.90 PROBE Satan 15892 5019 68.42 PROBE Ipsweep 12481 3723 70.17 PROBE Portsweep 10413 3564 65.77 PROBE Phf 4 4 0.00 R2L Guess_pwd 53 53 0.00 R2L Ftp_write 8 8 0.00 R2L Imap 12 12 0.00 R2L Spy 2 2 0.00 R2L Multihop 7 7 0.00 R2L Warezclient 1020 893 12.45 R2L Warezmaster 20 20 0.00 R2L Buffer_Overflow 30 30 0.00 U2R Loadmodule 9 9 0.00 U2R Perl 3 3 0.00 U2R Rootkit 10 10 0.00 U2R

Total

4898431 10,74,992 78.05%

113

Table 3.2 Class Labels that Appears in 10% KDDCUP99” Dataset

Category KDD Cup 99 FULL Dataset

After Removing Duplicate Samples

% rate of Reduction

Dataset Class

Normal 97278 87832 9.71 NORMAL

Back 2203 968 54.88 DOS

Pod 264 206 21.97 DOS

Land 21 19 9.52 DOS

Smurf 280790 641 99.77 DOS

Teardrop 979 918 6.23 DOS

Neptune 107201 51820 51.66 DOS

Nmap 231 158 31.60 PROBE

Satan 1589 906 42.86 PROBE

Ipsweep 1247 651 47.79 PROBE

Portsweep 1040 416 60.00 PROBE

Phf 4 4 0.00 R2L

Guess_pwd 53 53 0.00 R2L

Ftp_write 8 8 0.00 R2L

Imap 12 12 0.00 R2L

Spy 2 2 0.00 R2L

Multihop 7 7 0.00 R2L

Warezclient 1020 893 0.00 R2L

Warezmaster 20 20 0.00 R2L

Buffer_Overflow 30 30 0.00 U2R

Loadmodule 9 9 0.00 U2R

Perl 3 3 0.00 U2R

Rootkit 10 10 0.00 U2R Total 4,94,021 145586 70.53%

114

Table 3.3 KDDCUP99 Basic Features of Individual TCP Connections

Feature name Description Type

Duration Length(number of seconds) of the connection continuous

Protocol_type Type of the protocol, e.g. tcp, udp, etc., discrete

Service Network service on the destination, eg. http, telnet, etc., discrete

Src_bytes Number of data bytes from source to destination continuous

Dst_bytes Number of data bytes from destination to source continuous

Flag Normal or error status of the connection discrete

Land 1 if connection is from/to the same host/port;0 otherwise discrete

Wrong_fragment Number of ‘wrong’ fragments continuous

Urgent Number of urgent packets continuous

Table 3.4 Content Features within a Connection Suggested by Domain Knowledge

Feature name Description Type hot Number of ‘hot’ indicators continuous

Num_failed_logins Number of failed login attempts continuous

Logged_in 1 if successfully logged in; 0 otherwise

discrete

Num_compromised Number of ‘compromised’ conditions

continuous

Root_shell 1 if root shell is obtained; 0 otherwise

discrete

Su_attempted 1 if ‘su root’ command attempted; 0 otherwise

Discrete

Num_root Number of ‘root’ accesses Continuous

Num_file_creations Number of file creation operations

Continuous

Num_shells Number of shell prompts Continuous

115

Num_access_files Number of operations on access control files

continuous

Num_outbound_cmds Number of outbound commands in an ftp session

Continuous

Is_hot_login 1 if the login belongs to an ‘hot’ list; 0 otherwise

continuous

Is_guest_login 1 if the login is a ‘guest’ login; 0 otherwise

discrete

Table 3.5Traffic Features Computed Using a Two-second Time Window


count

Number of connections to the same host as the current connection in the past two seconds

Continuous

Note: the following features refer to these same- host connections.

Serror_rate % of connections that have ‘SYN’ errors Continuous

Rerror_rate % of connections that have ‘REJ’ errors Continuous

Same_srv_rate % of connections to the same service Continuous

Diff_srv_rate % of connections to the different services Continuous

Srv_count

Number of connections to the same service as the current connection in the past two seconds

Continuous

Note : the following features refer to these same – service connections

Srv_serror_rate % of connections that have ‘SYN’ errors Continuous

Srv_rerror_rate % of connections that have ‘REJ’ errors Continuous

Srv_diff_host_rate % of connections to different hosts Continuous

116

Table 3.6 Traffic Features computed using a Hundred- second connection windows


dst_host_count*

No.of connections to same host as the current connection in the past two seconds

continuous

dst_host_serror_rate* % of connections that have ‘SYN’ errors continuous

dst_host_rerror_rate* % of connections that have ‘REJ’ errors continuous

dst_host_same_srv_rate* % of connections to the same service continuous

dst_host_diff_srv_rate* % of connections to the different services continuous

dst_host_srv_count**

No. Of connections to the same service as the current connection in the past two seconds

continuous

dst_host_srv_serror_rate** % of the connections that have ‘SYN’ errors continuous

dst_host_srv_rerror_rate** % of the connections that have ‘REJ’ errors continuous

dst_host_srv_diff_host_rate** % of connections to different hosts

117

3.4 Feature Extractions and Pre-processing

The input data to the neural network must be in the range of (0

1) or (-1 1). Hence, pre-processing and normalization [112] of data is

required. The KDDCUP99 format data are pre-processed. Each record

in KDDCUP99 format has 41 features, each of which is in one of the

continuous, discrete and symbolic forms, with significantly varying

ranges. Based on the type of neural nets, the input data may have

different forms and so it needs different pre-processing. Some neural

nets only accept binary input and some can also accept continuous-

valued data. In Pre-processor [6, 24], after extracting KDDCUP99

features from each record, each feature is converted from text or

symbolic form into numerical form. For converting symbols into

numerical form, an integer code is assigned to each symbol. For

instance, in the case of protocol_type feature, 0is assigned to tcp, 1 to

udp, and 2 to the icmp symbol. Attack names were first mapped to one

of the five classes, 0 for Normal, 1 for Probe, 2 for DOS, 3 for U2R and

4 for R2L.

Two features are spanned over a very large integer range,

namely src_bytes [0, 1.3 billion] and dst_bytes [0, 1.3

billion].Logarithmic scaling (with base 10) was applied to these features

to reduce the range to [0.0, 9.14]. All other features were Boolean, in

118

the range [0.0, 1.0]. Hence scaling was not necessary for these

attributes.

3.4.1 Normalization

Pre-processing converts all the symbolic or text forms into

numerical values. The range of values of the different features is not

uniforms discussed above. Some features having large range of values

will influence the performance more than other features having less

range of values. Hence normalization is applied to the features to

convert the range of values to fall between 0 to 1. Different methods

are available to normalize the data as given below.

1. In one of the algorithms [51, 113, 149] each numerical value in

the data set is normalized between 0.0 and 1.0 according to the

following equation:

x = (x-min)/ (max-min)

Where,

x is the numerical value,

MIN is the minimum value for the attribute that x belongs to,

MAX is the maximum value for the attribute that x belongs to.

2. In another algorithm data normalization is done by applying the

formula

1/(1 + xt)

119

Where x is the input data at the time t.

3. For normalizing feature values, a statistical analysis (6) is

performed on the values of each feature based on the existing

data from KDDCUP99 data set and then acceptable maximum

value for each feature is determined. According to the maximum

values, by using the following simple formula (6) normalization of

feature values in the range (0, 1) is calculated.

If (f > MaxF) Nf=1; Otherwise Nf = ( f / MaxF) F: Feature, f:

Feature value

MaxF: Maximum acceptable value for F, Nf: Normalized or

scaled value of F

3.5 Performance Evaluation Metrics

The evaluation metrics used in our proposed method are true

positive (TP), true negative (TN), false positive (FP) and false negative

(FN). Here, true positive indicates the number of correctly classified

attack. A true positive is a sign of properly detecting the occurrences of

attacks in intrusion detection system. True negative indicates the

number of valid records that are correctly classified. A true negative

specifies that the IDS have not made a mistake in detecting a normal

condition. False positive indicates the records that were incorrectly

classified as attacks, whereas in fact they are valid activities. A false

120

positive specifies the wrong detection of a particular attack by IDS. A

false positive is often produced due to lost recognition conditions and it

represents the accuracy of the detection system. False negative

indicates the records that were incorrectly classified as valid activities,

whereas in fact they are attacks. A false negative stipulates that the

IDS is unable to detect the intrusion after a particular attack has

occurred. Based on TP, TN, FP and FN, the performance of our

intrusion detection system is evaluated by: a) Accuracy b) Detection

Rate (DR), c) Failure Analysis Rate (FAR). The accuracy of our system

is obtained by the following expression.

FNFPTNTPTNTPAccuracy

++++

= 3.1

Then, the Detection Rate (DR) is determined based on the expression

given below.

FPTPTPDRateDetectionR+

=)( 3.2

Detection rate shows the probability of abnormal data in the test

samples in detection. The higher Detection Rate indicates that the

algorithm can more accurately reflect the input data anomalies.

121

( ) FPFailure analysis rate FARFP TN

=+ 3.3

Failure analysis rate shows the accuracy of intrusion detection. Lower

FAR indicates that the accuracy of detection is high.

3.6 Summary

This chapter explains about the database used in this thesis, Proposed

Feature Extractions, Pre-processing techniques used for IDS and

performance evaluation metrics such as Accuracy, Detection Rate,

Failure Analysis Rate used for Intrusion Detection System.

122

CHAPTER 4

CLUSTERING BASED INTRUSION DETECTION

4.1 Introduction

Clustering can be considered the most important unsupervised

learning problem. The goal of clustering is to determine the intrinsic

grouping in a set of unlabelled data. Clustering is the process of

segmenting the data, which are similar in some ways to one another. A

good clustering method can be identified by which securing the result

of high intra-class similarity and low inter-class similarity. The

clustering quality fully depends upon the similarity measure which is

used by the method, and ability of finding hidden patterns and its

consequent implementation. Some of the applications of clustering are

pattern recognition, World Wide Web, image processing and spatial

data analysis etc. More than this, clustering is also used for energy

conservation applications. Various clustering algorithms like K-Means

Clustering and Fuzzy C-Means Clustering are available.

4.2 Need for Clustering of data

Cluster analysis or clustering is the task of assigning a set of objects

into groups (called Clusters) so that the objects in the same cluster

are more similar (in some sense or another) to each other than

123

to those in other clusters. Classification and clustering techniques

in data mining are useful for a wide variety of real time

applications dealing with large amount of data. Some of the

applications of data mining are text classification, selective

marketing, medical diagnosis, intrusion detection systems. In

information security, intrusion detection is the act of detecting

actions that attempt to compromise the confidentiality, integrity or

availability of a resource. Intrusion detection systems are software

systems for identifying the deviations from the normal behavior

and usage of the system. They detect attacks using the data

mining techniques- classification and clustering algorithms.

Being more generalized and having a wider scope as compared

to misuse detection systems, most of the current techniques focus

on anomaly detection systems. Data mining approaches can be

applied for both anomaly and misuse detection. Clustering techniques

can be used to form clusters of data samples corresponding to the

normal use of the system. Clustering based techniques can detect

new attacks as compared to the classification based techniques.

4.3 Clustering Algorithms

The input dataset given to the intrusion detection system

normally comprise huge quantity of data which makes the processing

124

very complex, hectic and time consuming. Executing this large number

of data can also lead to having poor results by the increase of errors.

Hence, it will have marked effect on the efficiency of the system and

ultimately leading to reduced quality intrusion detection system. To

compact this problem, clustering technique is employed prior to

classification. In this work some existing clustering techniques such as

K Means Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and

KFCM are discussed, And also proposed Fuzzy Bisector-Kernel Fuzzy

C-means clustering (FB-KFCM) their results are discussed.

4.3.1 K Means Clustering

K-means clustering is one of the simplest unsupervised

clustering algorithms. The algorithm takes input parameter ‘k’ and

partition the ‘n’ dataset into k cluster so that the intra-cluster similarity

is high and intercluster similarity is low. ‘K’ is a positive integer number

given in advance. K means clustering takes less time as compared to

the hierarchical clustering and yields better results.

With the help of clustering training dataset is clustered into 5

dataset wherein 4 dataset will be a type of intrusion called attack

dataset and one with normal data type called normal dataset. Here are

the four steps of the clustering algorithms:

1) Define the number of clusters K.

125

2) Initialize the K cluster centroids. This can be done by arbitrarily

dividing all objects into K clusters, computing their centroids, and

verifying that all centroids are different from each other.

Alternatively, the centroids can be initialized to K arbitrarily

chosen, different objects

3) Iterate over all objects and compute the distances to centroids of

all clusters. Assign each object to the cluster with the nearest

centroid.

4) Recalculate the centroids of both modified clusters.

5) Repeat step 3 until the centroids do not change any more.

A distance function is required in order to compute the distance

(i.e. similarity) between two objects. The most commonly used distance

function is the Euclidean one which is defined as:

d(x, y) = √∑ (xi − yi)2 (4.1)

Where x = (x1 . . . xm) and y = (y1…ym) are two input vectors

with m quantitative features. In the Euclidean distance function, all

features contribute equally to the function value. However, since

different features are usually measured with different metrics or at

different scales, they must be normalized before applying the distance

function.

126

4.3.2 Fuzzy K Means Clustering

The traditional k-means clustering algorithm suffers from serious

drawbacks like difficulty in finding the correct method for the cluster

initialization, making a correct choice of number of clusters (k).

Moreover k-means is not efficient for overlapped data set. There have

been many methods and techniques proposed to address these

drawbacks of k-means. Fuzzy k-means is one of the algorithms which

provide better result than k-means for overlapped dataset.

Fuzzy k-means was introduced by Bezdek [5].The fuzzy k-

means algorithm is also called fuzzy c-means. Unlike naive k-means

which assigns each data point completely belonging to one cluster, in

fuzzy c-means each data point has the probability of belonging to a

cluster. This allows data point of data set X to be a part of all centres of

set C.

For example, points on the edges of the clusters might belong to

a cluster with lesser degree than those data points belonging to the

same cluster at its centre. This algorithm is mainly used for datasets in

which the data points are between the centres. The algorithm works on

the objective to minimize the following function,

2

1 1( , ) n k mij i ji jF X C u x c

= == −∑ ∑ (4.2)

127

Here m is any real number greater than 1.Uij, is the degree of

membership of data point xi to the cluster centrecj with the limitation

that 1and 1 .kij ijju o u i

=≥ = ∀∑ Iterative procedure of optimizing the

objective function F(X, C) by updating the degree of membership of the

data point xi to the centrecj and the cluster centrecj results in the

clustering of the data.

21

121

1

k mi jj

k mi jj

ijx c

x cu

−−

=−−

=

−

−

=∑

∑

(4.3)

1

1

.k mij ii

k miji

ju x

uc =

=

= ∑∑

(4.4)

As the value of m increases the algorithm becomes fuzzy. At m

around 1 the sharing centres among data points becomes less and it

behaves like standard k-means [16]. For example consider a one

dimensional dataset as depicted in Figure 4.1.

Figure 4.1. Input mono dimensional data

128

We could find two clusters A and B based on the data points

associations. On applying k-means to the above dataset, each data

point is associated to the centroid close to it as depicted in Figure 4.2

Figure 4.2 Clustered using k means

Figure 4.3. Clustered Using Fuzzy K Means

129

If fuzzy k-means clustering approach is used on the dataset, the

data point does not exclusively belong to a cluster instead it is in the

middle way. There is a smoother line to indicate that every data point

may belong to more than one cluster as in Figure 4.3. More information

on this example can be found in [100].

4.3.3Fuzzy C-Means

The proposed FB-KFCM is an extension to KFCM which itself is

an extension to normally used FCM. Let the input data is represented

by, number of input data by and be a real number greater than

1representing the weighting co-efficient. The centre of the cluster is

represented by and the number of clusters by .Let represents the

degree of membership of in the cluster. Fuzzy C- Means (FCM)

clustering has the minimization objective function defined in eq.4.5,

2

1 1|||| ji

i

Nc

jij xzF −=∑∑

= =

ηϖ

ϖ µ (4.5)

In the process, initially arbitrary data points are assigned as

centroids and subsequently, membership values of the data points with

respect to the centroids are found out. The generalized formula for

finding membership function value is given in eq.4.6,

130

∑=

−

−−

=Nc

m mi

iiij cx

cz1

12

||||||||1

ϖµ

(4.6)

Afterwards, the updated centroid values are computed with the

use of found out membership values. The centroid updating equation is

given in eq.4.7,

∑∑==

=η

ϖη

ϖ µµ11 i

ijii

ijj zx (4.7)

Based on the updated centroid values, membership values are

again found out. This process is repeated in a loop process to have the

final clusters. The loop contains updating the membership value and

centre of the cluster centres. The loop condition is defined in eq.4.8,

}|{|max 1 λµµ <−= mij

mijijimum

(4.8)

Here, λ has the value between 0 and 1. Hence, FCM would

converge to a local minimum or a saddle point of ϖF .

4.3.4 KFCM

The negative aspect of FCM is the fact that it does not come up

with high-quality accurate results. This is overcome with the use of BF-

KFCM. BF-KFCM employs KFCM with additional steps. KFCM differs

from normal FCM with the use of kernel functions which yield better

131

results. Hence in KFCM, though the process is same as that of FCM, it

differs in the objective function and the updating equations.

In KFCM, input data (z) is mapped into a higher dimensional

space (S) represented by non-linear feature map function

Zzz ∈→ )(: ϕϕ . The objective function of KFCM is given by eq.4.9,

2

1 1||)()(|| ji

i

Nc

jij xzF ϕϕµ

ηϖ

ϖ −=∑∑= =

(4.9)

Where,

),(2),(),(||)()(|| 2jijjiiji xzGxxGzzGxz −+=−ϕϕ (4.10)

Here, )()(),( babaG Tϕϕ= which is the inner product kernel function

and in our case, we are considering Gaussian kernel function. Hence,

we have:

1),(,),( 0||||||||

2

2

2

2

====−

−−

−eeaaKhenceebaG

aabaσσ (4.11)

),(22||)()(||,1),(),( 2jijijjii xzGxzxxGzzG −=−∴== ϕϕ (4.12)

Hence, the objective function can be rewritten as in eq.4.13,

)],(1[21 1

jii

Nc

jij xzGF −= ∑∑

= =

ηϖ

ϖ µ (4.13)

132

Minimizing the objective function with respect to ijµ , we get the

updating equations for finding membership value ijµ and centroids jx is

given in eq.4.14,

∑

∑

∑ =

=

=

−

−

=

−

−= η

ϖ

ηϖ

η ϖ

ϖ

µ

µµ

1

1

1

)1(1

)1(1

),(.

),(.,

),(1(1

),(1(1

ijiij

iijiij

j

m mi

jiij

xzG

zxzGx

xzG

xzG (4.14)

4.3.5 Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM)

In FB-KFCM, fuzzy bisector is incorporated into the KFCM to

obtain better and more accurate results. Fuzzy bisector proceeds with

the predefined rules and splits the selected cluster into two. Selection

of the cluster is based on the parameters of Minimum Squared Error

(MSE) and number of data points in the cluster. The cluster formation

is carried out in various stages and in each stage; one existing cluster

is further divided into two clusters. Let the input dataset be

represented by },...,,{ 21 Ndzzzz = , where Nd is the number of input data.

After clustering the data would be grouped to form clusters represented

by },...,,{ 21 NFCFCFCFC = , where N is the number of clusters. Each

cluster )0( NiFCi ≤< would have certain data ii FCz ∈ from the input data

set. Let the data inside the thi cluster be represented by

133

},...,,{ 21 ncii riririFC = , where nci is the number of data in the thi cluster.

Illustration of proposed BF-KFCM clustering is given in figure 4.4.

Figure 4.4: Illustration of FB-KFCM clustering technique

The process of forming the final clusters is carried out in various

stages. If the numbers of clusters are to be formed is N, then FB-

KFCM will consist of N+ 1 stages. In stage 1, the input data is split into

two clusters with the use of KFCM. Let the input data be represented

as Z, the formed clusters as A1 and A2. In the next stage, a particular

cluster is taken and further divided to form two more clusters so as to

make 3 clusters in total. Selection of cluster which is to be divided

134

using KFCM is based on certain rules. For rule formation, two

parameters of MSE and number of data points in the respective cluster

are found out.

Mean Squared Error (MSE) for a cluster is found out by finding

the Euclidean distances between the data points and the centroid . Let

the data points in the thi cluster be represented by kdi and let the

number of data points in the cluster be Ni , centroid of thi cluster be

represented by ci then MSE is given in eq.4.15,

∑=

−=Ni

kki cidi

NiMSE

1

2||||1 (4.15)

By computing the MSE and number of data points for each of the

cluster A1 and A2, selection of the cluster to be split is made. The

selection condition is that the cluster should have maximum number of

points and minimum MSE. Let the number of data points in A1 and A2

be represented by 1NA and 2NA . Let the MSE value of A1 and A2 be

represented by 1MA and 2MA . Hence, the conditions can be written as

in eq.4.16 and eq.4.17,

1),21()21( ASelectMAMAANDNANAIf <> (4.16)

2),12()12( ASelectMAMAANDNANAIf <> (4.17)

135

In other cases, arbitrary selection is carried out between A1 and

A2. In our illustration, we have chosen A1 and are split to form B1 and

B2 by the use of KFCM. Hence, the clusters in consideration are A2,

B1 and B2. Subsequently, in stage 2, one among the three clusters is

selected and the selected cluster further divided with the use of KFCM.

The selection of the cluster to be divided is based on MSE and number

of data points. Let the number of data points in B1 and B2 be

represented by 1NB and 2NB . Let the MSE value of B1 and B2 be

represented by 1MB and 2MB . The selection is based on the following

conditions:

)2,1,1(2)2,1,1(2,2 MBMBMAMinimumMAANDNBNBNAMaximumNAIfASelect ==

)2,1,1(1)2,1,1(1,1 MBMBMAMinimumMBANDNBNBNAMaximumNBIfBSelect ==

)2,1,1(2)2,1,1(2,2 MBMBMAMinimumMBANDNBNBNAMaximumNBIfBSelect ==

For other cases, any of the three clusters is selected. In our

illustration, we have selected A2 and are divided to clusters C1 and

C2. Hence, the clusters in consideration are B1, B2, C1 and C2. In the

third stage, respective cluster to be divided by KFCM is found out as in

the earlier stages. Generalizing, suppose the clusters in the thi stage

are represented as KCCC ,...,, 21 . The number of data points in the

clusters is represented as KNNN ,...,, 21 and MSE of the clusters are

136

represented as KMMM ,...,, 21 , selection of the cluster which is to be

divided can be defined by the rule:

),...,,(),...,,(, 2121 KiKii MMMMinimumMANDNNNMaximumNIfCSelect ==

In the illustration example, C2 is selected in stage 3 and divided

to form D1 and D2. In stage 4, B2 is selected and subsequently, the

process is repeated to have the required clusters. The process of

dividing the selected cluster by the use of KFCM is carried out for all

the N stages to form 1+N clusters represented in eq.4.18,

)1(0; +≤< NiFCi . (4.18)

After having the required number of clusters, the centred from

each of cluster is calculated and is taken for further process. That is

instead of all the data inside the cluster, only the centroid is taken and

given to learning process. As all data points inside the cluster are more

or less the same, taking centroid will serve the purpose of representing

all data inside a cluster. This would lessen the time of computation in

further processes and also would reduce the complexity and risks. Let

the data inside the thi cluster be represented by },...,,{ 21 ncii riririFC = .

Hence the centroid ( iCen ) of thi cluster is found out in eq.4.19,

137

nci

riCen j

j

i

∑= . (4.19)

Where, ri is the represented ith cluster.

Hence, we have converted to large bulky dataset into small number of

data for better handling, learning and easier computation.

4.4 Classification Module The centroids obtained after the clustering process are used for the

learning or training process of Bayesian Neutral Network. The input to

the Bayesian Neutral Network would be centroids of the clusters given

in eq.4.20,

)1(0; +≤< NiCeni . (4.20)

4.4.1 Neural Network

Artificial Neural Networks provide a powerful tool for

classification and has been used in a broad range of areas. The latest

enormous research activities in neural classification have recognized

that neural networks are a gifted substitute for a variety of traditional

classification methods. The benefit of neural networks lies in the

subsequent theoretical facets. First, neural networks are data driven

self-adaptive methods in which they can fine-tune themselves to the

data exclusive of any clear specification of functional or distributional

138

form for the unique model. Second, they are universal functional

approximates in which neural networks can approximate whichever

function with random accuracy. Neural networks are nonlinear models,

which makes them stretchable in modelling real world intricate

relationships. Neural networks are able to approximate the subsequent

probabilities, which offer the basis for setting up classification rule and

performing statistical analysis.

Figure 4.5: Block diagram of the Neural Network

In general, the neural network consists of three layers named as

input layer, hidden layer and the output layer. The neural network

works making use of two phases, one is the training phase and the

other is the testing phase. In training phase, the network is trained

under large data base. In our case, the centroids found out after the

139

clustering is fed as the training data. Initially, the nodes are given

random weights. As the output is already known in the training phase,

the output obtained from the neural network is compared to the original

and weights are varied using algorithms so as to reduce the error.

Normally back-propagation algorithms are employed in Neural

Networks. In the testing phase, the input test data is fed to the trained

neural network having particular weights in the nodes and the output is

calculated so as to find if intruded or not. Figure 4.5 shows the general

block diagram of the neural network.

4.4.2 Bayesian Neural Network

Inclusion of Bayesian concept has the advantages of better

learning for Neural Networks. Bayesian based learning is based on

two properties. One is that background knowledge is utilised in

selecting prior probability distribution for model parameters. Second is

the fact that prediction are made with respect to the posterior

parameter distribution obtained by updating of the prior function. These

two properties are in built into the Neural Network to have the BNN.

Considering a single hidden layer based Neural Network, we can

see that the output can be mathematically written in eq.4.21,

∑+=k

kkiii xhbxy )()( ω

(4.21)

140

∑+=j

jjkik xahxhWhere ϖ)(tan)(, (4.22)

Here x represents the input vector, )(xyi denotes the output

value function, kiω gives the weight from hidden layer k to output i and

jkϖ gives the weight from input j to hidden layer k . The network can be

used to define probabilistic model for classification. This is carried out

by using the network output to define the target iz , given the input

vector x . For classification, where target is a single discrete value for

possible class outputs, the probability can be defined in eq.4.23,

∑==

j

xy

xy

j

i

eexizP )(

)(

)|( (4.23)

The bias and the weights present in the Neural Network are

based on the training inputs which contains the input values and the

corresponding output values. This can be represented by:

nizx ii <<0;),( )()( where, n is the total number of inputs. The weights

and the bias are updated based on the error in the network. This error

is computed as the squared sum of difference between the network

outputs and the target outputs. The updating is such way as to

minimize the error in the system. This minimization is equivalent to

likelihood estimation for Gaussian noise method where minus log of

likelihood is proportional to the sum of squared error.

141

In Bayesian approach to Neural Network, the objective is to find

the predictive distribution for the target values in a new test case, given

the input for that case, the input and the targets in the training cases.

Then, the predictive distribution can be written in eq.4.24,

θθθ dzxzxPxzPzxzxxzP nnnnnnnn )),)...(,(,().,|()),)...(,(;|( )()()1()1()1()1()()()1()1()1()1( =+=+ ∫= (4.24)

Where, θ gives the network parameters like weight and bias.

Posterior density for the parameters is proportional to product of prior

and likelihood function which can be represented in eq.4.25,

∏=

=n

j

jjnn xzPzxzxL1

)()()()()1()1( ),|()),)...(,(,( θθ

(4.25)

Hence, the learning is carried for all input data )1(0; +≤< NiCeni .

Once the learning process is carried out where the test data is given as

input to the trained network which outputs if the data is intruded or not.

4.5 Results and Discussions

The proposed technique is implemented using JAVA

PROGRAMMING on a system having 8GB RAM and 3.2 MHz

processor. To evaluate the performance of the proposed technique, we

used KDD CUP 99 DATASET for testing and evaluation. The

sophisticated version of DARPA dataset which enclose only network

data is named as KDD dataset [137, 138]. KDD training dataset

consists of comparatively 4,900,000 single correlation vectors where

142

each single connection vector consists of 41 features and is marked as

either normal or an attack, with exactly one particular attack type [139].

Four categories in which they fall is defined by: a) in a connection, the

first category consists of the inherent features which encompass the

primary features of every individual TCP connections. b) The content

features recommended by domain knowledge are employed to

calculate the payload of the original TCP packets. c) The same host

features monitor the familiar connections that have the identical target

host as present connection in past two seconds inside a connection

and the statistics related to the protocol behaviour, service, etc are

estimated. d) The related identical service features analyse the

connections that have the same service as the existing connection in

past two seconds.

Table 4.1: Accuracy table for Case 8:2

Case 8:2 K Means

+ Bayesian Network

FKM + Bayesian Network

FCM+ Bayesian Network

KFCM + Bayesian Network

BF-KFCM+

Bayesian Network

Cluster size=200 83.9201 85.3210 86.7189 93.2321 96.5506

Cluster size=180 83.9732 85.6684 86.9022 90.3874 93.4678

Cluster size=160 82.9934 85.3444 86.9355 90.3210 92.4013

Cluster size=140 83.7643 85.7021 86.9355 92.3542 93.4678

143


Case 7:3

K Means +

Bayesian Network




BF-KFCM+

Bayesian Network

Cluster size=200 82.8711 84.2111 86.7141 92.2021 94.4124

Cluster size=180 82.7021 84.0014 86.7141 94.0824 96.5563

Cluster size=160 82.6430 84.2311 86.7141 94.0210 96.7341

Cluster size=140 83.3403 84.4001 86.7141 90.4201 92.4017


Case 9:1

K Means +

Bayesian Network




BF-KFCM+

Bayesian Network

Cluster size=200 84.7602 85.8013 86.7711 92.8732 93.0023

Cluster size=180 84.8231 85.8724 86.7378 92.1532 93.4022

Cluster size=160 83.5210 85.4921 86.7452 90.9710 91.936

Cluster size=140 84.2318 85.7611 86.7378 91.7342 92.6015

144

Figure 4.6 Accuracy Plot for Case 8:2


75

80

85

90

95

100

Cluster size=200

Cluster size=180

Cluster size=160

Cluster size=140

K Means + Bayesian Network




BF-KFCM+ Bayesian Network

75

80

85

90

95

100

Cluster size=200

Cluster size=180

Cluster size=160

Cluster size=140






145


Table 4.4: Average Accuracy Table

Case K Means

+ Bayesian Network




BF-KFCM+

Bayesian Network

Case 8:2 83.6628 85.5009 86.8730 91.5737 93.9719

Case 7:3 82.8891 84.2109 86.7141 92.6814 95.0261

Case 9:1 84.3340 85.7317 86.7480 91.9329 92.7355

78

80

82

84

86

88

90

92

94

Cluster size=200

Cluster size=180

Cluster size=160

Cluster size=140






146

Figure 4.9: Average Accuracy Plot

Comparison of the existing technique such as K Means

Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and KFCM are

discussed, And also proposed Fuzzy Bisector-Kernel Fuzzy C-means

clustering (FB-KFCM) their results are discussed. Table 4.1 and Figure

4.6 gives the accuracy values and Plot for Case 8:2 for various cluster

size, Table 4.2 and Figure 4.7 gives the accuracy values and Plot for

Case 7:3 for various cluster size, Table 4.3 and Figure 4.8 gives the

accuracy values and Plot for Case 9.1 for various cluster size.

Accuracy values are taken for different cluster sizes of 140,160,180

and 200.In all cases the proposed technique has achieved better

accuracy value when compared with existing technique.

76

78

80

82

84

86

88

90

92

94

96

Case 8:2 Case 7:3 Case 9:1






147

Average accuracy value in Case 8:2 for existing technique K


KFCM are 83.66%, 85.50%, 86.87%, 91.57% respectively and for

proposed Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM)

the accuracy is 93.97%.











According to the results in table 4.4 and Figure 4.8 the proposed

Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM) attained

high accuracy of 93.91%. These values show the efficiency of the

proposed technique by achieving better accuracy values.

148

Table 4.5: Comparative Analysis

Technique Accuracy

KDD 99 winner 90.2

PNrule 85.6

Multi-class SVM 85.9

Layered Conditional Random Fields 90.1

Columbia Model 89.7

Decision Tree 72.4

BSPNN 92.3

BF-KFCM+ Bayesian Network 93.1

Figure 4.10 Accuracy plot for Comparative Analysis

0102030405060708090

100

Accuracy

Accuracy

149

The proposed technique is compared with other techniques in

the area. The comparison values are given in table 5.4 and figure 4.9.

Comparison is made respect to KDD 99 winner, PN rule, Multi-class

SVM, Layered Conditional Random Fields, Columbia Model, Decision

Tree and BSPNN. It is inferred that the proposed technique has

performed well by obtaining high accuracy value.

4.6 Summary

In this Chapter some existing clustering techniques such as K


KFCM were discussed and implemented. To evaluate the performance

of the proposed technique, we used KDD CUP 99 DATASET for

testing and evaluation. Based on the analysis it is observed that the


performs better than other methods in terms of accuracy which attains

an average high accuracy of 93.91% when compared with other

techniques

150

CHAPTER 5

HYBRID INTRUSION DETECTION SYSTEM

5.1 Introduction

Intrusion detection is the challenge to monitor and probably

prevent the attempts to intrude into or otherwise compromise your

system and network resources. One of the recent methods for

identifying any abnormal activities staging in a computer system is

carried out by Intrusion Detection Systems (IDS) and it forms a major

portion of system defence against attacks. In the literature, various

techniques for Intrusion Detection have been proposed in recent years.

One of the methods proposed is Intrusion Detection System (IDS)

based on Fuzzy Bisector- Kernel Fuzzy C-means clustering technique

and Bayesian Neural Network. In the previous chapter, the

dimensionality of the data played a major role in obtaining the better

detection rate. In order to overcome the dimensionality issue, feature

selection would be right choice to improve the detection rate without

compromising the computation time. In this chapter LDA+CS (Linear


LDA and CS. LDA is a commonly used technique for dimensionality

reduction. Here, CS will be incorporated with the intention of assisting

the ill-conditioning issue by selecting an “optimal” subset of features

151

that result in an intermediate lower dimensional subspace. Then,

feature reduced dataset is grouped into clusters with the use of Fuzzy

Bisector- Kernel Fuzzy C-means clustering (FB-KFCM). In the

classification step, the centroids from the clusters are taken for training

the Bayesian Neural Network. For the online identification of intrusion

detection node, test data is given to the trained network, which outputs

if the data is intruded or not. The entire system will be applied to

medical sensor network to find the intrusion behaviour by simulating

the networks in JAVA. Finally, the performance of the system will be

analysed using KDD CUP 99 dataset in terms of accuracy.

5.2 Need for Hybrid Approach

In the past, data mining techniques such as using

association rules were suggested to build IDS. They have

distinguished the differences between single-connection and multi

connection attacks. Both signature- based and anomaly-based

IDSs are sensitive to the attack characteristics, system

training history, services provided, and underlying network

conditions. Data mining techniques are also used to build

classification models from labeled attacks. Intrusion detection must

be designed to monitor the connection features at the network,

transport, and application layers.

152

In this work, we propose the Hybrid Intrusion Detection System

architecture. For signature based system we define features from the

observations as well as the previous labels and perform sequence

labeling in the observation. This setting is sufficient for modeling the

correlation between different features of an observation.

For anomaly-based system we investigate user patterns, such

as profiling the programs executed daily or the privileged processes

executed with access to resources that are inaccessible to ordinary

user by collecting the volatile data from the system. Then we train our

system by using conditional random fields, which reduces the false

alarm rate Hybrid intrusion detection is a novel kind of model

combining the advantages of anomaly based intrusion detection and

signature based intrusion detection. Intrusion and anomalies are two

different kinds of abnormal traffic events in an open network

environment. An intrusion takes place when an unauthorized access of

a host computer system is attempted. An anomaly is observed at the

network connection level. Both attack types may compromise valuable

hosts, disclose sensitive data, deny services to legitimate users, and

pull down network based computing resources. The intrusion detection

system (IDS) offers intelligent protection of networked computers or

distributed resources much better than using fixed-rule firewalls.

153

Existing IDSs are built with either signature-based or

anomaly-based systems. Signature matching is based on a misuse

model, whereas anomaly detection is based on a normal use model.

The design philosophies of these two models are quite different, and

they were rarely mixed up in existing IDS products from the security

industry. The signatures are manually analyzed by security experts

analyzing previous attacks. The collected signatures are used to match

with incoming traffic to detect intrusions. These are conventional

systems that detect known attacks with low false alarms. However, the

signature-based IDS cannot detect unknown attacks without any pre

collected signatures or lack of attack classifiers.

5.3 Application of Hybrid Approach

A hybrid intelligent system uses the approach of integrating

different learning or decision-making models. Each learning model

works in a different manner and exploits different set of features.

Integrating different learning models gives better performance than

the individual learning or decision-making models by reducing their

individual limitations and exploiting their different mechanisms.

In a hierarchical hybrid intelligent system each layer provides

some new information to the higher level. The overall functioning of the

system depends on the correct functionality of all the layers. It is

154

used to filter out a large number of packet records using the

anomaly detection module, and second detection can perform with

the misuse detection module if the packet is determined to intrusion.

Hence, it efficiently detects intrusion and merges the outputs

of the misuse detection modules and anomaly detection with a

decision making module. Hybrid approach, find out intrusion, and

tells the type of attack. The output of the decision making module is

then send to an administrator for follow-up, it is not only reduces the

threat of attack in the system, but also helps user to handle and correct

the system further with hybrid detection. In HIDS, the performance of

the misuse detection module is evaluated.

5.4 Locality Preserving Cuckoo search Algorithm

Intrusion detection system is a device used to identify whether

the input data is intruded or not. The process is done by classifying the

huge amount of input data in to different groups or classes by

clustering. In our proposed hybrid intrusion detection system, the input

dataset consists of large number of data with various attacks. So,

classifying this huge dataset is difficult and time consuming and there

is also a possibility of increasing the error rate. The different attacks

found in our datasets are, DOS (Denial of Service attack), R2L

(Remote to Local (User) attack), U2R (User to Root attack) and

155

Probing Surveillance. To overcome the drawbacks of the previously

donetworks;we have introduced a new method called LDA-CS here,

which will improve the detection rate of our intrusion detection system.

Figure 5.1: Proposed Intrusion Detection System

156

The proposed method consists of two phases, namely, the

training phase and the testing phase. For training and testing, we have

used the KDD cup 99 dataset in our method. The general architecture

of our proposed method is shown in Fig.5.1.

5.4.1 Training Phase

The training phase consists of various processing stages such

as the input data set is clustered and classified using various

techniques like LDA-CS, FB-KFCM and Bayesian Neural Network.

Here, we have used the KDD cup 99 dataset that is huge in size. It

consists of approximately 4,900,000 single connection vectors, each of

which contains 41 features. In general, the classifier delivers more

accurate results only while using complete linear feature space. But,

the direct application of this dataset to the classifier has various

drawbacks such like the classifier becomes biased due to architecture

complexity and training as well as testing efficiency decreases. It also

results in increasing memory consumption rate and computational cost.

In order to overcome these problems, it is best to adopt some

approaches for selecting the optimal subset of features from a linear

space of features. Hence, Cuckoo Search algorithm that is commonly

called as LDA+CS is applied in this work to select the optimal subset of

linear feature space.

157

The LDA+CS consist of process as follows:

(i) Initialization

(ii) Fitness calculation and Nest update

Figure 5.2 Fixed Nests

5.4.1.1 Initialization

In the cuckoo search algorithm, a fixed host nest is built at a size

of Mn× . Here n is the number of nest and M is the number of

attributes. The fixed host nest is an index to select the relevant

features from the original dataset. Here, the class for each nest is not

defined in the fixed host nest. So, in order to determine the class for

each host nest based feature, we have used a classifier called LDA

here. It is used to identify whether the host nest based data is intruded

158

or not. The fixed nest built is shown in Fig.5.2.Further initialization

process is done based on the fixed host nest.

Figure 5.3: Nest formation from original dataset

5.4.1.2 Fitness Calculation and Nest update

In this stage, the fixed host nest built is randomly assigned with a

probability of 1s and 0s. After this selection, both the 1s and 0s based

relevant features from the original dataset will increase the

computational complexity. So, based on the cuckoo search algorithm

the nest with 1s is selected and the random nest with 0 is neglected.

This results in dimensionless feature subset based on the neglected

nest with 0s. The selected subset contains features that are relevant to

the selected host nest with a size of mn ×1 where, m is the dimension

159

of the reduced subset. Finally, a dimension reduced N number of

training feature subset is obtained from an original set (N) where, the

size is MN < . The general data set and the host nest obtained is

shown in Fig.5.3.

Here, the dimension reduced subset contain only valuable

information and has some data about some of the other features.

Furthermore, the subset with relevant feature is given to LDA for

classification. The LDA has various stages of processing, which doesn’t

change the location but only tries to provide more class separately and draws

a decision region between the given classes. The input to LDA is N

dimensional training subset that belongs to different class v with Ni samples

in the ith class. The first stage of the LDA is to group the subset of data into

two different classes, which are attack or not. For each subset, the within-

class distance and the between-class distance is computed for two different

classes. For N number of training datasets, the mean vector and the

covariance matrix is calculated for each class of the complete data set. It is

given as in the equation below.

∑=

=n

iiNN

1 (5.1)

Where, N represent the total number of training subset were, Ni

represents the number of training datasets in class i. Naturally, the

number of classes is i. The scatter matrix is calculated by Eigen

160

decomposition that is applicable to high dimensional data. The within

class and between class scatter matrix calculated is represented as

WC and BC. The scattering matrixes are represented by the equation

below.

The Between class scatter matrix BC is represented as:

Tii

n

iiC vvvvNNB ))((/

1−−= ∑

= (5.2)

The within class matrix WC is represented as:

Tiji

ij

n

i

N

jC vzvzNW

i

))((/1 )()(

1 1−−= ∑∑

= = (5.3)

Here, the mean for the ith class, Vi is represented by the equation:

∑=

=iN

j

ijii zNv

1

)(/1 (5.4)

Similarly, the total mean of the class for the whole dataset is

represented by the equation given below.

∑∑= =

=n

i

N

j

ij

i

zNv1 1

)(/1 (5.5)

Finally, a discriminant function is determined based on the following

equation.

161

])[( 1CCLDA BwtrY −= (5.6)

Fitness for each training subset is obtained based on the LDA

classifier. A dimension reduced subset of feature is obtained and

applied to LDA in single iteration. Likewise, the process is repeated

until the global best solution is obtained. Here, N number of training

subset is given as input to the LDA classifier and n number of fitness is

obtained for each subset. The N number of fitness functions

determined for the fixed host nest is, Nfffff ,.....,, 321= .Among this,

the best fitness is found and replaced as X best. Finally, the accuracy of

our system is determined based on the ratio of the total number of

correct predictions to the actual data set size. The fitness function f is

calculated by the equation given below.

Accuracyfitness −=1 (5.7)

In order to generate a new solution, Levy Flight is performed that

provides a random walk. The new solution )1( +ty is determined based

on the equation given below, but maintain the current best.

)()()1( λα Levyyy tt ⊕+=+ (5.8)

162

Figure.5.4: LDA+CS Flow Diagram

163

Where, 0>α is the step size. But in most cases, we use

1=α .This has an infinite variance with an infinite mean. Here, the

consecutive steps of a cuckoo essentially form a random walk process

that obeys a power-law step-length distribution with a heavy tail. In

addition, a fraction of the worst nests can be abandoned, so that the

new nests can be built at new locations by random walks and mixing.

The mixing of the solutions can be performed by random permutation

according to the similarity/difference to the host eggs. The flow

diagram of the designed LDA+CS is shown in Fig.5.4.The optimal

dimensionality reduced features are further clustered using a technique

called FB-KFCM (Fuzzy Bisector-Kernel Fuzzy C-Mean).

5.5 Clustering using FB-KFCM

Clustering is one of the common methods used to group the

optimal features obtained from LDA+CS. Some of the clustering

methods used previously are not suitable for large datasets. So, we

have proposed a new method for effective clustering by incorporating

Fuzzy Bisector with fuzzy C-means clustering called as FB-KFCM

here. In order to obtain better results, here we have used a modified

technique by incorporating Fuzzy Bisector called as FB-KFCM. The

general operation of the newly incorporated fuzzy bisector is based on

the optimal features and Minimum Squared Error (MSE) parameters.

164

The initial stage of the fuzzy bisector is that it initially selects a

cluster based on the above parameters and is divided in to two using

fuzzy c-means technique. The process has several stages and each

contains single bisection, which increases the number of clusters by

one. The input dataset to the FB-KFCM algorithm is represented as:

},...,,{ 21 dxxxX = where, d is the size or dimension of the dataset.

Further, the input dataset is clustered and grouped into n number of

clusters as represented below.

},...,,{ 21 NCCCQ = (5.9)

Here, each grouped cluster has data ix belonging to iQ . Also, the

data inside the ithcluster iC is represented as: },...,,{ 21 ki DDDC = where,

k is the number of data in the ithcluster. Each cluster has a group of n

number of data. The proposed FB-KFCM is shown in Fig.5.5.

The FB-KFCM clustering includes N+ 1 stages and in each

stage, the input data is divided into two clusters by KFCM algorithm.

For each input data X, two clusters are formed and are further divided

in to two clusters such as A and B. In the next stage, one among the

two clusters is taken and its divided into two based on the KFCM. The

total number of clusters in this stage is three and likewise, there are m

165

number of clustering stages and are grouped in to n clusters as

denoted by the following equation.

},...,,{ 21 NCCCQ = (5.10)

Then, for each grouped cluster, the Mean Square Error is detected

based on the Euclidian distance between the data points and centroid.

The MSE of the ith cluster is represented by the equation given below.

∑=

−=Ni

kki ciC

NiMSE

1

2||||1

(5.11)

Finally, for N number of clustering stages, the data points in the

clusters is represented as KDDD ,...,, 21 and MSE of the clusters are

represented as KEEE ,...,, 21 . Each stage of the process is carried out by

the KFCM, which has totally N+ 1 stages. Hereafter, the centroid for

each cluster is to be calculated for further process. The centroid based

classification has various advantages such as less time consumption

and reduced complexity. The centroid of the ithcluster is calculated by

the equation given below.

K

DW j

j

i

∑=

(5.12)

166

Based on the above equation, the centroid for each cluster is

calculated and given to the classification process.

Figure 5.5: FB-KFCM

5.6 Classification using Bayesian Neural Network

Classification in intrusion detection is to train the centroid based

grouped data obtained from FB-KFCM. The centroid of each cluster is

trained by classifier to identify whether the input data is intruded or not.

167

In this proposed system, the Bayesian Neural Network is used for

better classification. Bayesian neural network is the improved version

of artificial neural network to obtain robust classification result. In

Bayesian Neural Network Classifier (BNNC), the weight decay

parameter can be adjusted automatically to obtain the optimal solution

during training. The whole data can be used for training without any

need of separate validation. The centroid value obtained from each

cluster of the input data is given to the BNNC for training. Let the

centroid input to the Bayesian classifier be represented by the equation

below.

)1(0; +≤< NiWi (5.13)

The general neural network contains three layers, namely, the

input layer, the hidden layer and the output layer. Initially, the centroid

obtained from each cluster is given as input to the Bayesian neural

network to select the prior probability distribution for model parameters.

Second is the fact that the prediction are made with respect to the

posterior parameter distribution obtained by updating of the prior

function. The Bayesian neural network is formed based on the above

two properties. Let the input be the vector of real centroid value Wi.

The output of each input centroid is trained by varying the weight at

168

each node to obtain the best classification result. The architecture of

the Bayesian neural network is shown in Fig. 5. 6.

Figure 5.6 Bayesian Neural Network Classifier (BNNC)

169

The output for the single hidden layer based Bayesian neural network

is computed based on the equation given below.

)))((()(1

0 xPWbVxyM

ikikk ∑

=

+= (5.14)

j

d

jijj xWbxPWhere ∑

=

+=1

tan)(,

Here, ijW is the weight on the connection from the input unit j to the

hidden unit i. Similarly, kiW is the weight on the connection from hidden

unit i to output unit k. Also, the biases of the hidden and output unit are

kb and jb . The activation function of the output layer is 0V . Further, to

avoid larger weights, a weight decay function is added to the data error

function De . Particularly, for classification problem, we have:

Wh

H

hhDe eJeT ∑

=

+=1 (5.15)

Where, eT is the total error function, hJ is a non-negative parameter for

the distribution of other parameters such as weights and biases. Here,

Whe is the weight error for the hth group of weights and biases and H is

the number of groups of weights and biases in the neural network.

Hereafter, the parameters such as weights and biases are grouped into

a single W dimensional weight vector w. According to the given weight

170

w, the posterior distribution of the given data D is represented by the

equation below.

)/()/(),/(),/(

µµµµ

DPwPwDPDwP =

(5.16)

Where, { }HJJJ ,......, 21=µ .

Also, the prior distribution of the weight is represented by the equation:

−= ∑

=Wh

H

hh

W

eJZ

wP1

exp)(

1)/(µ

µ (5.17)

Where, 2/

1

2)(hH W

h HW J

Z ∏=

=

πµ

The posterior density for the parameters is proportional to the product

of prior and hence, the training process is carried out for all clustering

centroid )1(0; +≤< NiWi . After training, the test data is given to our

Bayesians trained neural network to determine whether the output data

is attacked or not.

5.7 Summary

In this chapter the Hybrid Intrusion Detection Systemising

LDA+CS (Linear Discriminant Analysis + Cuckoo search) is developed

by combining LDA and CS. LDA is a commonly used technique for

dimensionality reduction. Fuzzy Bisector- Kernel Fuzzy C-means

171

clustering (FB-KFCM) is used as the clustering techniques and in this

proposed system; the Bayesian Neural Network is used for better

classification. The entire system will be applied to medical sensor

network to find the intrusion behaviour by simulating the networks in

JAVA. Finally, the performance of the system will be analysed using

KDD CUP 99 dataset in terms of accuracy.

172

CHAPTER 6

RESULTS AND IMPLEMENTATION

The proposed technique of Linear Discriminant Analysis + Cuckoo

search Fuzzy Bisector- Kernel Fuzzy C-means clustering (LDA-

CS+FB-KFCM+ Bayesian Network) is implemented using JAVA

PROGRAMMING on a system having 8GB RAM and 3.2 GHz

processor.. To evaluate the performance of the proposed technique,

we have used KDD CUP 99 DATASET for testing and evaluation. The

KDD CUP 99 dataset used here is a version of the original 1998

DARPA intrusion detection evaluation program. Also, it is one of the

publicly available data set that has actual attacks [142]. So, we have

used the dataset here to design and evaluate our intrusion detection

system.

The KDD CUP 1999 dataset used here was obtained from raw

TCP dump data for a length of nine weeks. The dataset is made of

large number of network traffic activities that include both normal and

malicious connections, which has five million connection records as

training data and two million as test data. Each instance has 41

features which are marked as normal or an attack. Totally 38 different

attacks are found in both training and testing data, which falls into four

173

main categories such as PROBE, denial of service (DOS), remote to

local(R2L) and user to root(U2R). [139,122].

The KDD Cup99 dataset are available in three different files

such as KDD Full Dataset that contains 4898431 instances, KDD Cup

10% dataset that contains 494021 instances and KDD Corrected

dataset that contains 311029 instances. In table 1, the details about

the KDD full and KDD 10% dataset are given. Table 1 explores the

number of samples present in each category before and after the

reduction of duplicate samples with percentage of reduction. Similarly,

Table 2 contains detail information on KDD Corrected and GureKDD

dataset along with before and after the reduction of redundancy

samples with percentage of reduction. The reduction of duplicate

samples is based upon algorithm 1. The Table 2 elaborates the forth

mentioned four attack category on KDD Cup 3 different datasets with

number of samples in each category and percentage of reduction after

applying algorithm 1. Each sample of the dataset represents a

connection between two network hosts according to network protocols.

It is described by 41 attributes, out of which 38 are continuous or

discrete numerical attributes and 3 are categorical attributes. Each

sample is labelled as either normal or one specific attack. The dataset

contains 23 class labels, out of which 1 is normal and remaining 22 are

174

different attacks. The KDD cup 99 dataset is huge in size, which offers

difficulty in performing the research. So, we have used a subset of 10%

of KDD cup 99 dataset for research.

Table 6.1 Attack Distribution in KDDfull, KDD 10% and KDD Corrected

dataset.

Dataset

DoS

U2R

R2L

Probe

Normal

Total

KDD Full 3883370 52 1126 41102 972781

4898431

KDD Full After removing duplicate Samples

247267 52 999 13860 812814

1074992

KDD 10% 391458 52 1126 4107 97278

494021

KDD10% After removing duplicate Samples

54598 52 999 2133 87832

145586

KDD Corrected 229269 70 16172 4925 60593

311029

KDD Corrected after removing

Duplicate samples

22984 70 2898 3426 47913 77291

175

Table 6.2: Accuracy for 8:2

Case 8:2 KFCM + Bayesian Network


LDA+CS+FB-KFCM+

Bayesian Network

Cluster size=200 93.2321 96.5506 97.4163

Cluster size=180 90.3874 93.4678 97.4003

Cluster size=160 90.3210 92.4013 97.2720

Cluster size=140 92.3542 93.4678 97.4303




LDA+CS+FB-KFCM+

Bayesian Network

Cluster size=200 92.2021 94.4124 98.4653

Cluster size=180 94.0824 96.5563 98.4135

Cluster size=160 94.0210 96.7341 98.3765

Cluster size=140 90.4201 92.4017 97.9872

176




LDA+CS+FB-KFCM+

Bayesian Network

Cluster size=200 92.8732 93.0023 99.3074

Cluster size=180 92.1532 93.4022 99.3155

Cluster size=160 90.9710 91.9360 99.3015

Cluster size=140 91.7342 92.6015 99.0612

86

88

90

92

94

96

98

Cluster size=200

Cluster size=180

Cluster size=160

Cluster size=140



LDA+CS+FB-KFCM+ Bayesian Network


177

86

88

90

92

94

96

98

100

Cluster size=200

Cluster size=180

Cluster size=160

Cluster size=140





86

88

90

92

94

96

98

100

Cluster size=200

Cluster size=180

Cluster size=160

Cluster size=140





178

Table 6.5: Average Accuracy Table

Case KFCM + Bayesian Network


LDA+CS+FB-KFCM+

Bayesian Network

Case 8:2 91.5737 93.9719 97.3797

Case 7:3 92.6814 95.0261 98.3106

Case 9:1 91.9329 92.7355 99.2464

86

88

90

92

94

96

98

100

Case 8:2 Case 7:3 Case 9:1




Figure 6.4: Average Accuracy Plot

179

6.1 Comparative Analysis

Comparison of the existing technique such as KFCM+ Bayesian

network and Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-

KFCM)+ Bayesian network are compared along with the proposed

hybrid technique LDA+CS+ FB-KFCM+ Bayesian Network their results

are discussed. In Table 6.2 and Figure 6.1 gives the accuracy values

and Plot for Case 8:2 for various cluster size, Table 6.3 and Figure 6.2

gives the accuracy values and Plot for Case 7:3 for various cluster

size, Table 6.4 and Figure 6.3 gives the accuracy values and Plot for

Case 9.1 for various cluster size. Accuracy values are taken for

different cluster sizes of 140,160,180 and 200.In all cases the

proposed technique has achieved better accuracy value when

compared with existing technique.

Average accuracy value in Case 8:2 for existing technique

KFCM+ Bayesian network and Fuzzy Bisector-Kernel Fuzzy C-means

clustering (FB-KFCM)+ Bayesian network are 91.57% and 93.97%

respectively and for the proposed hybrid technique LDA+CS+ FB-

KFCM+ Bayesian Network the accuracy is 97.38%.

180











According to the results in table 6.5 and Figure 6.4 the hybrid

technique LDA+CS + FB-KFCM+ Bayesian Network attained high

accuracy of 98.31%. These values show the efficiency of the proposed

technique by achieving better accuracy values.

6.2 Implementation in Medical Sensor Network

The proposed intrusion detection system is applied to medical

sensor network in order to detect which of the data are intruded and

not intruded. Finally, the proposed algorithm is stimulated using

medical sensor networks that consist of totally 8668 data. The whole

data of the medical sensor network is trained using Bayesian neural

181

network in our algorithm. After training process, we have used 10 data

for testing at each time. In this testing stage, our algorithm will detect

which of the data were intruded and not intruded among the 10 data.

Here, at time T1 we have used 10 nodes for testing and the simulation

result obtained using our method is shown in Fig. 6.5.

T1 T2

T3 T4

Figure 6.5. Simulation Result obtained for time T1, T2, T3 and T4

182

In the simulated result, two colors such as red and green was

obtained that indicate the data type. The red colour in the result

indicates the intruded data and the green colour in the result indicate

the not intruded data. Among the 10 data given for testing time T1, 6

are not intruded data indicated by green and the remaining four are

intruded data indicated by red colour. Similarly in time T2, another 10

data is given for testing in our algorithm and the simulation result is

obtained. From the simulation result obtained at time T2, we have

found that 7 of the data among 10 are not intruded and the remaining 3

are intruded.

Again at time T3, the simulation result is obtained for 10 test

data. From the result, we have found that 8 among 10 are not intruded

and remaining 2 are intruded. Further at time T4, the 8 data among 10

are not intruded and remaining 2 are intruded, while testing 10 data in

our proposed intrusion detection algorithm.

6.3 Summary

In this Chapter Comparison of the existing technique such as


clustering (FB-KFCM)+ Bayesian network are compared along with the

proposed hybrid technique LDA+CS+ FB-KFCM+ Bayesian Network

their results are discussed. To evaluate the performance of the

183

proposed technique, we used KDD CUP 99 DATASET for testing and

evaluation. Based on the comparative analysis the proposed hybrid

technique LDA+CS+ FB-KFCM+ Bayesian Network attained high

accuracy of 98.31%. These values show the efficiency of the proposed

technique by achieving better accuracy values. Finally, the proposed

algorithm is stimulated using medical sensor networks that consist of

totally 8668 data. The simulation result is obtained for 10 test data.

From the result, we have found that 8 among 10 are not intruded and

remaining 2 are intruded. This attains high accuracy rate.

184

CHAPTER 7

CONCLUSION

In this intrusion detection system, an LDA+CS (Linear


LDA and CS. The LDA+CS technique is used in this work for

dimensionality reduction and optimal feature selection. Some of the

clustering methods used previously are not suitable for large datasets.

So, we have proposed a new method for effective clustering by

incorporating Fuzzy Bisector with Kernel Fuzzy C-means clustering

called as FB-KFCM here. Further, the feature reduced dataset is

grouped into clusters using Fuzzy Bisector- Kernel Fuzzy C-means

clustering (FB-KFCM) method. Then, in the classification step, the

centroids from the clusters were taken and trained using the Bayesian

Neural Network. Bayesian neural network is the improved version of

artificial neural network to obtain robust classification result. For the

online identification of intrusion detection node, test data is given to the

trained network and tested for obtaining which of the given data is

intruded or not. The entire system is applied to medical sensor network

to find the intrusion behavior by simulating the networks in JAVA using

KDD CUP 99 dataset. The evaluation metric utilized is the accuracy

and the comparative analysis is made against the other techniques.

185

Average accuracy value was found to be 98.31, which was better than

the other compared techniques. The high accuracy value shows the

efficiency of the proposed technique.

7.1 Contributions

The contributions in this work are summarized as follows:

1. In this work, different variants of intrusion detection techniques like

Anomaly based intrusion detection, Signature based intrusion

detection, Host based intrusion detection, Network based intrusion

detection and hybrid intrusion detection for improving performance

in Medical Sensor network are studied and analyzed.

2. In this work existing clustering techniques such as K Means

Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and

KFCM are discussed and also proposed Fuzzy Bisector- Kernel

Fuzzy C-means clustering (FB-KFCM) are designed and

developed.

3. Based on the analysis it is observed that the proposed Fuzzy

Bisector-Kernel Fuzzy C-means clustering (FB-KFCM) performs

better than other methods in terms of accuracy which attains an

average high accuracy of 93.91% when compared with other

techniques such as KFCM and KFCM with Bayesian Network.

186

4. The Hybrid Intrusion Detection System using LDA+CS (Linear

Discriminant Analysis + Cuckoo search) is

developed by combining LDA and CS. LDA is a commonly

used technique for dimensionality reduction. Fuzzy Bisector-

Kernel Fuzzy C-means clustering (FB-KFCM) is used as the

clustering techniques and in this proposed system; the Bayesian

Neural Network is used for better classification.

5. To evaluate the performance of the proposed technique, we used

KDD CUP 99 DATASET for testing and evaluation. Based on the

comparative analysis the proposed hybr id technique LDA+CS

+ FB-KFCM+ Bayesian Network attained high accuracy of 98.31%.

These values show the efficiency of the proposed technique by

achieving better accuracy values.

6. Finally, t h e p ropo se d a lgo r i t hm i s s t imu la ted us ing

med ica l sensor networks that consist of totally 8668 data. The

simulation result is obtained for 10 test data. From the result,

we have found that 8 among 10 are not intruded and

remaining 2 are intruded. This attains high accuracy rate.

187

7.2 Future Works

Following future works are proposed as continuation of the research

presented in this thesis

• In future, it is possible to provide extensions or modifications to the

proposed clustering and classification algorithms using intelligent

agents to achieve further increased performance. Apart from the

experimented combination of data mining techniques, further

combinations such as artificial intelligence, soft computing and other

clustering algorithms can be used to improve the detection accuracy

and to reduce the rate of false negative alarm and false positive

alarm. Finally, the intrusion detection system can be extended as an

intrusion prevention system to enhance the performance of the

system.

• The research in intrusion detection and the application of data

mining, and machine learning plays an important role in the security

of current and future computer networks. This thesis has explored

the feasibility of using supervised and unsupervised learning in the

classification of intrusion-detection attacks, and opens multiple

possibilities for future exploration and research, which may lead to

the design and the development of more efficient, reliable and

effective in detection, and preventive IDS systems.

188

REFERENCES

1. Adebayo O. Adetunmbi, Samuel O. Falaki, Olumide S. Adewale and

Boniface K. Alese, ―Network Intrusion Detection Based On Rough Set

And K-Nearest Neighbour, International Journal of Computing and ICT

Research, Vol. 2, No. 1, pp. 60 – 66, 2008.

2. AbhijitSarmah, “Intrusion Detection Systems: Definition, Need and

Challenges”, White Paper from SANS Institute, 2001.

3. Adeyinka, O.,(2008), “Internet Attack Methods and Internet Security

Technology Modeling & Simulation” , AICMS 08. Second Asia

International Conference on,vol., no., pp.77 ‐82.

4. Agrawal R and R. Srikant,(1994)“Fast algorithms for mining association

rules”.

5. Indraneel Mukhopadhyay, Mohuya Chakraborty and Satyajit

Chakrabarti “A Comparative Study of Related Technologies of

IntrusionDetection & Prevention Systems” Journal of Information

Security, 2011, 2, 28-38.

6. Amini M. et.al. (2004), ‘Network-Based Intrusion Detection Using

Unsupervised Adaptive Resonance Theory (ART)’, Proceedings of the

4th Conference on Engineering of Intelligent Systems (EIS 2004),

Madeira, Portugal.

7. Amoroso E, Wykrywanieintruzów, Wydawnictwo RM, Warszawa 1999.

8. AnazidaZainal, MohdAizainiMaarof and Siti Maryam Shamsudin ,

“Research Issues in Adaptive Intrusion Detection”, In Proceedings of

the 2nd Postgraduate Annual Research Seminar (PARS'06), Faculty of

Computer Science & Information Systems, UniversitiTeknologi

Malaysia, 24 – 25 May, 2006.

9. Andonie. R and Kovalerchuk. B, Neural networks for data mining:

constrains and open problems

189

10. Anil Kumar K S and Dr. V. NandaMohan, " Novel Anomaly Intrusion

Detection Using Neuro-Fuzzy Inference System ", IJCSNS International

Journal 6 of Computer Science and Network Security, vol.8, no.8, pp.6-

11 , August 2008.

11. Axelsson S.: Intrusion Detection Systems: A Taxomomy and Survey.

Technical Report No 99-15, Dept. of Computer Engineering, Chalmers

University of Technology, Sweden, March 2000,

12. Bahrololum M, E. Salahi and M. Khaleghi “Anomaly intrusion detection

design using hybrid of unsupervised and supervised neural networks”,

International Journal of Computer Networks & Communications, Vol.1,

No.2, 2009.

13. Barbara, D., N. Wu, and S. Jajodia, Detecting novel network intrusions

using Bayes estimators, In Proc. of the First SIAM Int. Conf. on Data

Mining (SDM 2001), Chicago, Society for Industrial and Applied

Mathematics (SIAM), 2001

14. Bass T.: Intrusion Detection Systems Multisensor Data Fusion: Creating

Cyberspace Situational Awareness. Communication of the ACM, Vol.

43,Number 1, January 2000, pp. 99-105,

15. Mohammad Khubeb Siddiqui and Shams Naahid,” Analysis of KDD

CUP 99 Dataset using Clustering based Data Mining” International

Journal of Database Theory and Application Vol.6, No.5 (2013), pp.23-

34

16. Bezdek J C, Pattern Recognition with fuzzy objective function

algorithms, Newyork: Plenum, 1981.

17. BhavyaDaya , (2010), “Network Security: History, Importance, and

Future”, University of Florida Department of Electrical and Computer

Engineering.

18. BOLEY, D.L. 1998. Principal direction divisive partitioning. Data Mining

and Knowledge Discovery, 2, 4, 325-344.

190

19. Cabrera, J.B.D., Ravichandran, B &Mehra R.K. (2000). Statistical Traffic

Modelling for Network Intrusion Detection. In Proceeding of the IEEE

Conference

20. Hayoung Oh, Inshil Doh, Kijoon Chae, “Attack classification based on

data mining technique and Its application for reliable medical sensor

communication”, International Journal of Computer Science and

Applications Te chnoma

pp 20 – 32, 2009

21. Cannady J, “Artificial Neural Networks for Misuse Detection”, In

Proceedings of the ’98 National Information System Security

Conference (NISSC’98), pp. 443-456, 1998.

22. Carbone, P. L., Data mining or knowledge discovery in databases: An

overview. In Data Management Handbook. New York: Auerbach

Publications, 1997.

23. Chan, P. K., M. V. Mahoney, and M. H. Arshad, Managing Cyber

Threats: Issues, Approaches and Challenges, Chapter Learning Rules

and Clusters for Anomaly Detection in Network Traffic, Kluwer, 2003.

24. Chimphlee W. et al. (2006), ‘Anomaly-Based Intrusion Detection using

Fuzzy Rough Clustering’, proceedings of the International Conference

on Hybrid Information Technology , Vol. 1, pp. 329-334.

25. Crosbie, M. and E. H. Spafford, Active defense of a computer system

using autonomous agents. Technical Report CSD-TR-95-008, Purdue

Univ., West Lafayette, IN, February, 1995.

26. Cuppen, F. &Miege, A. (2002). Alert Correlation in a Cooperative

Intrusion Detection Framewok. In Proceeding of the 2002 IEEE

Symposium on Security and Privacy. IEEE, 2002]

27. Daniel Barbara, Julia C., (2001),“ADAM: Detecting Intrusions by Data

Mining” , Proceedings of the 2001 IEEE Workshop on Information

Assurance and Security United States Military Academy, West Point,

NY, 5.

191

28. DARPA Intrusion Detection Evaluation Data Set” from

http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/19

98data.html

29. Dasarathy B V, “Intrusion Detection”, Information Fusion, Vol.4, No.4,

pp.243-245, 2003.

30. Dasgupta, D. and F. A. Gonz´alez, An intelligent decision support

system for intrusion detection and response. In Proc. of International

Workshop on Mathematical Methods, Models and Architectures for

Computer Networks Security (MMM-ACNS), St.Petersburg. Springer-

Verlag, 21-23 May, 2001.

31. Dash M. and Liu H.,(1997), “Feature selection for classification”,

Intelligent Data Analysis: An International Journal, PP. 131–156.

32. Debar H., Dacier M., Wespi A.: Towards a taxonomy of intrusion-

detection systems. Computer Networks, 31, 1999, pp. 805-822.

33. Dewan Md. Farid and Mohammad Zahidur Rahman, “Anomaly Network

Intrusion Detection Based on Improved Self Adaptive Bayesian

Algorithm”, Journal of Computers, Vol.5, No.1, January, 2010.

34. Didaci, L., G. Giacinto, and F. Roli, Ensemble learning for intrusion

detection in computer networks. http://citeseer.nj.nec.com/533620. html,

2002.

35. Disha Sharma, Fuzzy Clustering as an Intrusion Detection Technique,

International Journal of Computer Science & Communication Networks,

Vol.1, No.1,2011.

36. Dorosz P., Kazienko P. Systems wykrywaniaintruzów. VI Krajowa

Konferencja Zastosowan Kryptografii ENIGMA 2002, Warsaw 14-17

May 2002 , p. TIV 47-78, (In Polish only)

37. Dorothy E. Denning. An intrusion detection model. IEEE Transactions

on Software Engineering, SE-13(2):222–232, 1987.

192

38. Ektefa M, S. Memar, F. Sidi and L.S. Affendey, "Intrusion detection

using data mining techniques", In proceedings of International

Conference on Information Retrieval & Knowledge Management,

(CAMP), pp. 200-203, 2010.

39. Ellen Pitt and RichiNayak, (2007),“The Use of Various Data Mining and

Feature Selection Methods in the Analysis of a Population Survey

Dataset”, Conferences in Research and Practice in Information

Technology.

40. Eskin E. et.al. (2000), ‘Adaptive Model Generation for Intrusion

Detection Systems’, Proceedings of the 7th ACM Conference on

Computer Security, Athens, Greece.

41. Eskin E. et.al. (2002), ‘A Geometric Framework for Unsupervised

Anomaly Detection: Detecting Intrusions in Unlabeled Data’, Data

Mining for Security Applications, Kluwer, Academic publishers, 2002.

42. Eskin, E., Anomaly detection over noisy data using learned probability

distributions. In Proc. 17th International Conf. on Machine Learning,

San Francisco, pp. 255–262, Morgan Kaufmann, 2000.

43. Faizal, M.A., MohdZaki M., Shahrin Sahib, Robiah, Y., Siti Rahayu, S.,

and Asrul Hadi, Y. “Time Based Intrusion Detection on Fast Attack for

Network Intrusion Detection System”, Second International Conference

on Network Applications, Protocols and Services, IEEE, 2010.

44. Fan W., Miller M., Stolfo S., Lee W., Chan P.: Using Artificial Anomalies

to Detect Unknown and Known Network Intrusions. In Proceedings of

the First IEEE International Conference on Data Mining, San Jose, CA,

November 2001,

45. Farah J., Mantaceur Z. & Mohamed BA. (2007). A Framework for an

Adaptive Intrusion Detection System using Bayesion Network.

Proceeding of the Intelligence and Security Informatics, IEEE, 2007.

46. Farid Dewan Md. and Rahman Mohammad Zahidur , “Anomaly Network

Intrusion Detection Based on Improved Self Adaptive Bayesian

Algorithm”, Journal of Computers, Vol.5, No.1, January, 2010.

193

47. Fengmin Gong, “Deciphering Detection Techniques: Part II Anomaly-

Based Intrusion Detection”, White Paper from McAfee Network Security

Technologies Group, 2003.

48. Frederick K. K.: Network Intrusion Detection Signatures. December 19,

2001, http://online.securityfocus.com/infocus/1524.

49. Gang Wang, Jinxing Hao, Jian Ma and Lihua Huang, ―A new approach

to intrusion detection using Artificial Neural Networks and fuzzy

clustering, Expert System with Applications, Vol.37, No.9, pp.6225–

6232, 2010.

50. Garuba, M., Liu, C. & Fraites, D. (2008). Intrusion Techniques:

Comparative Study of Network Intrusion Detection Systems. In

Proceeding of Fifth International Conference on Information

Technology: New Generation, IEEE, 2008.

51. Gomez et al (2002), ‘Evolving Fuzzy Classifiers for Intrusion Detection’,

Proceedings of the 2002 IEEE Workshop on Information Assurance

United States Military Academy, West Point, NY June 2001.

52. Gong F, “Deciphering Detection Techniques: Part II Anomaly-Based

Intrusion Detection”, White Paper from McAfee Network Security

Technologies Group, 2003.

53. Gowrisona G, K. Ramarb, K. Muneeswaranc, T. Revathic, " Minimal

complexity attack classification intrusion detection system", Applied Soft

Computing, Vol 13, pp: 921–927, 2013.

54. Nancy,Jasdeep kaur,Rameet Kaur ,Nishu ,”Datamining-a review and

Description”,International journal on recent and innovation on trends in

computing and communication, Vol:1Issue:7,pp:582-586,2013

55. Hafiz Muhammad Imran, Azween Bin Abdullah, Muhammad Hussain,

Sellappan Palaniappan and Iftikhar Ahmad, Intrusions Detection based

on Optimum Features Subset and Efficient Dataset Selection,

International Journal of Engineering and Innovative Technology (IJEIT),

Vol.2, No. 6, 2012.

194

56. Harley Kozushko, “Intrusion Detection: Host-Based and Network-Based

Intrusion Detection Systems”, White Paper from Independent Study,

September 11, 2003.

57. Hazem M. El-Bakry, "Automatic Human Face Recognition Using

Modular Neural Networks," Machine Graphics & Vision Journal (MG&V),

vol. 10, no. 1, 2001, pp. 47-73.

58. Hazem M. El-Bakry, and Nikos Mastorakis “Fast Detection of Specific

Information in Voice Signal over Internet Protocol,” Proc. of 7th WSEAS

Int. Conf. on COMPUTATIONAL INTELLIGENCE, MAN-MACHINE

SYSTEMS and CYBERNETICS (CIMMACS '08), Cairo, EGYPT, Dec.

29-31, 2008, pp. 125-136.

59. Hazem M. El-Bakry, Nikos E. Mastorakis, Michael E. Fafalios, “Fast

Information Retrieval from Big Data by using Cross Correlation in the

Frequency Domain,” Proc. of IEEE IJCNN 2013, Dallas Tx, USA,

August 4-9, 2013, pp. 366-272.

60. Helmer, G., J. Wong, V. Honavar, and L. Miller, Automated discovery of

concise predictive rules for intrusion detection. Technical Report 99-01,

Iowa State Univ., Ames, IA, January, 1999.

61. Hershkop S., Apap F., Eli G., Tania D., Eskin E., Stolfo S., (2007),“A

data mining approach to host based intrusion detection” , Technical

reports, CUCS Technical Report.

62. Introduction to Data mining and knowledge discovery, two crows

corporations, 2005.

63. Intrusion Detection Systems (IDS). Group Test (Edition 3), NSS Group,

July 2002, http://www.nss.co.uk/ids/edition3/index.htm.

64. IoannisKrontiris, ZinaidaBenenson, ThanassisGiannetsos, Felix C.

Freiling and TassosDimitriou, ―Cooperative Intrusion Detection in

Wireless Sensor Networks, Lecture Notes in Computer Science, Vol.

5432, pp 263-278, 2009.

65. Ion IANCU and Mihai GABROVEANU (2010)"Fuzzy Logic Controller

Based on Association Rules".

195

66. Irvine (1999), ‘KDD Cup 1999 Data’, 5th International Conference on

Knowledge Discovery and Data Mining, http:// kdd.ics.uci.edu/

databases/kddcup99/ kddcup99.html.

67. ITA, The internet traffic archive, 2000, http://ita.ee.lbl.gov/.

68. James P. Anderson. Computer security threat monitoring and

surveillance. Technical report, James P. Anderson Co., 1980.

69. Javitz, H. S. and A. Valdes, The NIDES statistical component:

Description and justification, Technical report, SRI International,

March,1993

70. Jian Pei , Jiawei Han , Laks V. S. Lakshmanan, “Pushing Convertible

Constraints In Frequent Itemset Mining”, Data Mining And Knowledge

Discovery, Vol. 8, No.3, pp.227-252, May 2004.

71. Jiawei Han And Micheline Kamber,(2008), “Data mining concepts and

techniques” , Morgan Kaufmann publishers .an imprint of Elsevier .ISBN

978-1-55860-901-3. Indian reprint ISBN 978-81-312- 0535-8 .

72. John Wack, Ken Cutler, Jamie Pole,(2002), “Guidelines on Firewalls

and Firewall Policy ” ,Recommendations of the National Institute of

Standards and Technology.

73. Jones A.K., Sielken R.S.: Computer system intrusion

detection:survey.09.02.2000,IDSresearch/Documents/jones-sielken-

survey-v11.pdf.

74. Joseph T and H. T. Nguyen, "Neural network control of wheelchairs

using telemetric head movement," Proceedings of the 20th Annual

International Conference of the IEEE, Engineering in Medicine and

Biology Society, vol. 5, pp. 2731 - 2733, 1998.

75. Joshua W. Haines et.al. (2001), ‘Extending the DARPA Off-Line

Intrusion Detection Evaluations’, Proceedings of IEEE DARPA

Information Survivability Conference and Exposition II, Vol. 1, 77-88.

76. Joyce Jackson,(2002), “Data Mining: A Conceptual Overview”

Communications of the Association for Information Systems ,Vol 8.

196

77. Karen S. and Peter M.,(2007), “Guide to Intrusion Detection and

Prevention Systems”, National Institute of Standards and Technology,

Department of Commerce, USA.

78. Karl Levitt. (2002). Intrusion Detection: Current Capabilities and Future

Direction. Proceeding of IEEE Conference of the 18th Annual Computer

Security Application, IEEE, 2002.

79. Karthik G and Nagappan A, ―Intrusion Detection System Using Kernel

FCM Clustering and Bayesian Neural Network, International Journal of

Computer Science and Information Technology & Security (IJCSITS),

Vol. 3, No.6, 2013.

80. Kayacik, G. H., Zincir-Heywood, A. N.,(2005), “Analysis of Three

Intrusion Detection System Benchmark Datasets Using Machine

Learning Algorithms” , Detection System Benchmark Datasets Using

Machine Learning Algorithms” ,

81. KDDCup1999Data”from

http://www.sigkdd.org/kddcup/index.php?section=1999&method=data

82. KdNuggets,(2007), “Data Mining Methodology”,

http://www.kdnuggets.com/polls/2007/ datamining_methodology.htm,

83. Keim, Daniel A.(2002)" Information Visualization and Visual Data

Mining".

84. Kendall, K.,(1999) “ A database of computer attacks for the evaluation

of intrusion detection systems” , Masters thesis, Massachusetts Institute

of Technology.

85. Kumar G, K. Kumar and M. Sachdeva, (2010),s“ The Use of Artificial

Intelligence based Techniques For Intrusion Detection – A Review,

Artificial Intelligence Review” vol. 34, No. 4, pp. 369-387, Springer,

Netherlands, DOI: 10.1007/s10462-010-9179-5 ISSN: 0269-2821.

86. L. O and N. M, “Ordered estimation of missing values,” in PAKDD,

1999, pp. 499–503.

197

87. Latifur Khan, MamounAwad, BhavaniThuraisingham, “A new intrusion

detection system using support vector machines and hierarchical

clustering”, The International Journal on Very Large Data Bases, Vol.

16, no. 4, October 2007.

88. Lee W, S. Stolfo, and K. Mok, “A Data Mining Framework for Building

Intrusion Detection Model”, In Proceedings of the IEEE Symposium on

Security and Privacy, Oakland, CA: IEEE Computer Society Press, pp.

120-132, 1999.

89. Lee W. (1994-1999), ‘Dissertation: A Data Mining Framework for

Constructing Features and Models for Intrusion Detection Systems’,

Ph.D thesis, Columbia University, New York, NY.

90. Lee W. et.al. (2001), ‘Real Time Data Mining-based Intrusion Detection’,

Proceedings of the Second (DARPA) Information Survivability

Conference and Exposition, pp. 85-100.

91. Lee W., Stolfo.S.J. (1998), ‘Data mining approaches for intrusion

detection’, Proceedings of the 7th USENIX Security Symposium, pp. 79-

94, Texas.

92. Lee, W. and S. J. Stolfo, A framework for constructing features and

models for intrusion detection systems. Information and System

Security 3 (4), 227–261, 2000.

93. Li Tian and Wang Jianwen, Research on Network Intrusion Detection

System Based on Improved K-means Clustering Algorithm, International

Forum on Computer Science-Technology and Applications, pp.76 – 79,

2009.

94. Lippmann R.P. et.al. (2000), ‘Evaluating Intrusion Detection Systems:

The 1998 DARPA Off-line Intrusion Detection Evaluation’, Proceedings

of the 2000 DARPA Information Survivability Conference and Exposition

(DISCEX), pp. 12-26, Los Alamitos.

198

95. Lippmann, R. P., D. J. Fried, I. Graf, J. W. Haines, K. Kendall,

D.McClung, D. Weber, S. Webster, D. Wyschogrod, R. K. Cunningham,

and M. Zissman, Evaluating intrusion detection systems: The 1998

DARPA off-line intrusion detection evaluation. In Proc. of the DARPA

Information Survivability Conference and Exposition, Los Alamitos, CA.

IEEE Computer Society Press, January, 2000.

96. Luo J, and S. M. Bridges, “Mining fuzzy association rules and fuzzy

frequency episodes for intrusion detection”, International Journal of

Intelligent Systems, Vol. 15, No. 8, pp. 687-704,2000.

97. Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu and Ali A. Ghorbani, "A

detailed analysis of the KDD CUP 99 data set", in Proceedings of the

Second IEEE international conference on Computational intelligence for

security and defense applications, pp. 53-58, Ottawa, Ontario, Canada,

2009.

98. Marcos M. Campos, Boriana L. Milenova, “Creation and Deployment of

Data Mining-Based Intrusion Detection Systems in Oracle Database

10g”, In Proceedings of the Fourth International Conference on Machine

Learning and Applications, 2005.

99. Marin, G.A.,(2005), "Network security basics Security & Privacy” ,, IEEE

, vol.3, no.6, pp. 68 ‐72.

100. Matteucci M, "A tutorial on clustering algorithms,"

http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html

101. McClure, S., InfoWorld Security Suite 16 debuts. http://www.idg,

net/crd_detection_16738.html, 1998

102. Michael J. Pazzani , (2000), “Knowledge discovery from DATA?”, IEEE

Intelligent Systems.

103. MIT Lincoln Laboratory (1999), ‘DARPA Intrusion Detection Evaluation’,

http://www.ll.mit.edu/IST/ideval/data/data_index.html.

104. Mohammadreza Ektefa , Sara Memar, Fatimah Sidi, Lilly

SurianiAffendey.,(2010),“Intrusion Detection Using Data Mining

Techniques ” , 978-1-4244-5651-2/10/2010.

199

105. Mohanabharathi R, T.Kalaikumaran and S.Karthi, ―Feature Selection

for Wireless Intrusion Detection System Using Filter and Wrapper

Model,International Journal of Modern Engineering Research (IJMER),

Vol.2, No.4, pp-1552-1556, 2012.

106. Nada Lavrac, BlažZupan (2005)"Data Mining in Medicine"in Data Mining

and Knowledge Discovery Handbook.

107. National Research Council (2008)"Protecting Individual Privacy in the

Struggle Against Terrorists.

108. Neri, F., Comparing local search with respect to genetic evolution to

detect intrusion in computer networks. In Proc. of the 2000 Congress on

Evolutionary Computation CEC00, La Jolla, CA, pp. 238–243. IEEE the

North American Fuzzy Information Processing Society, Atlanta, pp.

301–306. North American Fuzzy Information Processing Society

(NAFIPS), July, 2000. Press, 16-19 July, 2000.

109. Nguyen H T, L.M. King and G. Knight, ―Real-time head-movement

system and embedded Linux implementation for the control of power

wheelchair, Proceedings of the 26th Annual International Conference of

the IEEE Engineering in Medicine and Biology Society, pp. 4892-4895,

2004.

110. Nivedita Naidu and Dr.R.V.Dharaskar, “An Effective Approach to

Network Intrusion Detection System using Genetic Algorithm”,

International Journal of Computer Applications, Vol.1, No.3, pp.26–32,

February 2010.

111. Noel, S., Wijesekera, D., and Youman, C., “Modern Intrusion Detection,

Data Mining, and Degrees of Attack Guilt”, Applications of Data Mining

in Computer Security, Kluwer Academic Publishers, pp. 2-25, 2002.

112. Novikov D. et.al. (2006), ‘Anomaly Detection Based Intrusion Detection’

Proceedings of the Third IEEE International Conference on Information

Technology: New Generations (ITNG'06), pp. 420-425.

200

113. Oh et al (2009), ‘Attack classification based on data mining technique

and its application for reliable medical sensor communication,

International Journal of Computer Science and Applications, Techno

mathematics Research Foundation, Vol. 6, No. 3, pp 20 – 32.

114. Pachet, François; Westermann, Gert; and Laigre,

Damien(2001)"Musical Data Mining for Electronic Music Distribution".

115. Peter Lichodzijewski, A. NurZincir-Heywood and Malcolm I. Heywood,

―Host-Based Intrusion Detection Using Self-Organizing Maps,Fac. of

Comput. Sci.

116. Phillip A. Porras and Alfonso Valdes. Live tra_c analysis of tcp/ip

gateways. In Proceedings of the 1998 ISOC Symposium on Network

and Distributed System Security (NDSS'98), San Diego, CA, March

1998. Internet Society.

117. Ptacek, T. H. and T. N. Newsham, Insertion, evasion and denial of

service: Eluding network intrusion detection, Technical report, Secure

Networks, Inc., January, 1998.

118. Rasha g. Mohammed Helali, "data mining based network intrusion

detection system: a survey", novel algorithms and techniques in

telecommunications and networking, pp. 501-505, 2010.

119. Richard Heady, George Luger, Arthur Maccabe, and Mark Servilla. The

architecture of a network level intrusion detection system. Technical

report, University of New Mexico, 1990.

120. Rupali Datti and Bhupen draverma, Feature Reduction for Intrusion

Detection Using Linear Discriminant Analysis, International Journal on

Computer Science and Engineering, Vol. 02, No. 04,pp.1072-1078,

2010.

121. Sandra Liewis, Liangxiu Han and John A. Keane(2013)"Understanding

Low Back Pain using Fuzzy Association Rule Mining".

122. Santosh Kumar Sahu Sauravranjan Sarangi and Sanjaya Kumar Jena,

“A Detail Analysis on Intrusion Detection Datasets”, International

Advance Computing Conference, pp.1348 – 1353, 2014.

201

123. Sarab M. Hameed, Sumaya Saad, and Mayyadah F. AlAni, ―An

Extended Modified Fuzzy Possibilistic C-Means Clustering Algorithm for

Intrusion Detection, Lecture Notes on Software Engineering, Vol. 1, No.

3, 2013.

124. SAVARESI, S. and BOLEY, D. 2001. On performance of bisecting k-

means and PDDP. In Proceedings of the 1st SIAM ICDM, Chicago, IL.

125. Sekar, R., Gupta, A., Frullo, J., Shanbhag, T., Tiwari, A., Yang, H. &

Zhou, S. (2002). Specification-based Anomaly Detection: A New

Approach for Detecting Network Intrusions. In Proceeding of CCS ACM

Conference.

126. Shailendra Singh and Sanjay Silakari, ―Generalized Discriminant

Analysis algorithm for feature reduction in Cyber Attack Detection

System, (IJCSIS) International Journal of Computer Science and

Information Security, Vol. 6, No. 1, 2009.

127. Shailendra Singh, Sanjay Silakari and Ravindra Patel, ―An efficient

feature reduction technique for intrusion detection system, International

Conference on Machine Learning and Computing ,vol.3,2011.

128. Shekhar R. Gaddam, Vir V. Phoha, Kiran S. Balagani, “K-Means+ID3: A

Novel Method for Supervised Anomaly Detection by Cascading K-

Means Clustering and ID3 Decision Tree Learning Methods”, IEEE

Transactions on Knowledge and Data Engineering, Vol. 19, No. 3, pp.

345-354, 2007.

129. Shingo Mabu, Nannan Lu, Kaoru Shimada, Kotaro Hirasawa, " An

Intrusion-Detection Model Based on Fuzzy Class-Association-Rule

Mining Using Genetic Network Programming", IEEE Transactions On

Systems, Man, And Cybernetics—Part C: Applications And Reviews,

VOL. 41, NO. 1, PP: 130-139 , 2011.

130. Shon T, Seo J, and Moon J, “SVM Approach with A Genetic Algorithm

for Network Intrusion Detection”, Lecture Notes in Computer Science,

Springer Berlin / Heidelberg, Vol. 3733, pp. 224-233, 2005, ISBN 978-3-

540-29414-6.

202

131. Singh, S. and S. Kandula, Argus - a distributed network-intrusion

detection system. Undergraduate Thesis, Indian Institute of Technology,

May, 2001.

132. Snehal A. Mulay, P.R. Devale and G.V. Garje, Intrusion Detection

System using Support Vector Machine and Decision Tree, International

Journal of Computer Applications, Vol.3, No.3, pp. 0975 – 8887, 2010.

133. Son T. Nguyen, Hung T. Nguyen and Philip B. Taylor, ―Bayesian

Neural Network Classification of Head Movement Direction using

Various Advanced Optimisation Training Algorithms International

Conference on Biomedical Robotics and Biomechatronics, pp.1014-

1019, 2006.

134. Sri latha Chebrolu, Ajith Abraham and Johnson P Thomas, ―Hybrid

Feature Selection for Modelling Intrusion Detection Systems, Lecture

Notes in Computer Science,Vol.3316, pp 1020-1025, 2004.

135. STEINBACH, M., KARYPIS, G., and KUMAR, V. 2000. A comparison of

document clustering techniques. 6th ACM SIGKDD, World Text Mining

Conference, Boston, MA.

136. Sumathi M and Umarani R, ―Advanced Network Intrusion Detection

System Based on Effective Feature Selection, International Journal of

Computer Science and Information Technologies, Vol. 4, No.1,

pp. 107 – 112, 2013.

137. Satyanarayan Misra1, Prof. Sanjay Singh2 and Pradeep Kumar

Tiwari3,” Classification of Dataset Using Clustering Technique”

International Journal of Computer Science and Telecommunications

[Volume 3, Issue 4, April 2012]

138. Taylor P B and H. T. Nguyen, "Performance of a head-movement

interface for wheelchair control," Proceedings of the 25th Annual

International Conference of the IEEE Engineering in Medicine and

Biology Society, vol. 2, pp. 1590 - 1593, 2003.

203

139. Thomas G. Dietterich and Ghulum Bakiri, ” Solving Multiclass Learning

Problems via Error-Correcting Output Codes”, International Journal of

Artificial Intelligent research,Vol.2, pp.263-286,1995.

140. Thomas G. Dietterich and GhulumBakiri,Solving Multiclass Learning

Problems via Error-Correcting Output Codes, International Journal of

Artificial Intelligent research,Vol.2, pp.263-286,1995.

141. Tsai C F, Y. F. Hsu, C. Y. Lin and W. Y. Lin, (2009), “ Intrusion

detection by machine learning: A review” ,, Expert Systems with

Applications, Vol 36, Issue 10, pp. 11994-12000. 2009.

142. U .Aickelin, J Twycross and T HeskethRoberts, “Rule Generalization in

Intrusion Detection Systems Using SNORT”, International Journal of

Electronic Security and Digital Forensics, Vol.1, No.1, pp.101-116,

2007.

143. U. Fayyad, D. Haussler, and P. Stolorz.(1996), “From Data Mining to

Knowledge Discovery in Databases” , 0738-4602-1996.

144. Warrender, C., S. Forrest, and B. A. Pearlmutter, Detecting intrusions

using system calls: Alternative data models. In Proc. of the 1999 IEEE

Symp. on Security and Privacy, Oakland, CA, pp. 133–145. IEEE

Computer Society Press, 1999.

145. Wenke Lee and Salvatore J. Stolfo, “Data Mining Approaches for

Intrusion Detection”, Proceedings of the 7th USENIX Security

Symposium, San Antonio, Texas, January 26-29, 1998.

146. Whitman M. E. &Mattord H. J. ,(2007), “Principles of Information

Security” (2nd ed.), New Delhi: Thomson Learning/Course Technology.

147. Witten IH, Frank E. , (2005),“ Data Mining: Practical Machine Learning

Tools and Techniques” , Second edition, Morgan Kaufmann, 2005.

148. Wu Junqi1, Hu Zhengbing.,(2008), “ Study of Intrusion Detection

Systems (IDSs) in Network Security”., ISSN 978-1-4244-2108-4/08/,

2008 IEEE.

204

149. Yan et al (2009), ‘A Hybrid Intrusion Detection System of Cluster-based

Wireless Sensor Networks’, Proceedings of the International Multi

Conference of Engineers and Computer Scientists Vol. 1, March 18 -

20, 2009, Hong Kong.

150. Yao, J. T., S.L. Zhao, and L.V. Saxton, “A Study On Fuzzy Intrusion

Detection”, In Proceedings of the Data Mining, Intrusion Detection,

Information Assurance, And Data Networks Security, SPIE, Vol. 5812,

pp. 23-30 ,28 March - 1 April, Orlando, Florida, USA, 2005.

151. Yeophantong, T, Pakdeepinit, P., Moemeng, P & Daengdej, J.

(2005).Network Traffic Classification Using Dynamic State Classifier. In

Proceeding of IEEE Conference.

152. Yeung, D.Y., and C. Chow, Parzen-window network intrusion detectors.

In Proc. of the Sixteenth International Conference on Pattern

Recognition, Volume 4, Quebec City, Canada, pp. 385–388. IEEE

Computer Society, 11-22 August, 2002.

153. Yi Mao, Yixin Chen, Gregory Hackmann, Minmin Chen, Chenyang Lu,

Marin Kollef, Thomas C.Bailey(2011)"Medical Data Mining for Early

Deterioration Warning in General Hospital Wards".

154. Yu Y, and Huang Hao, “An Ensemble Approach to Intrusion Detection

Based on Improved Multi-Objective Genetic Algorithm”, Journal of

Software, Vol.18, No.6, pp.1369-1378, June 2007.

155. Zhiyuan Tan, ArunaJamdagni, Xiangjian He and Priyadarsi Nanda,

―Network Intrusion Detection Based on LDA for Payload Feature

Selection,IEEE GLOBECOM Workshop on Web and Pervasive

Security, pp. 1545-1549, 2010.

205

LIST OF PUBLICATIONS

International Journals

• Karthik G, Nagappan A “Intrusion Detection System Using

Kernel FCM Clustering and Bayesian Neural Network”,

International Journal of Computer Science and Information

Technology & Security (IJCSITS), Vol 3 Issue 6, Pages 391-399,

December 2013.

• Karthik G, Geetha T, Nagappan A “Development of Hybrid

Intrusion Detection System and Its Application to Medical

Sensor Network”, International Journal of Innovative Research in

Computer and Communication Engineering (IJIRCCE), Vol. 3

Issue 9, Pages 8182-8198, September 2015..

206