detecting network intrusion based on data mining ...€¦ · detecting network intrusion based on...
TRANSCRIPT
DETECTING NETWORK INTRUSION BASED ON DATA MINING TECHNIQUES AND ITS APPLICATION
FOR MEDICAL SENSOR NETWORK
Thesis submitted in
Partial Fulfillment for the award of
Degree of Doctor of Philosophy in
Computer Science and Engineering
By
G. KARTHIK
FACULTY OF ENGINEERING AND TECHNOLOGY
VINAYAKA MISSIONS UNIVERSITY
(VINAYAKA MISSIONS RESEARCH FOUNDATION – DEEMED TO BE UNIVERSITY) SALEM, TAMILNADU, INDIA
NOVEMBER 2016
VINAYAKA MISSIONS UNIVERSITY
SALEM
DECLARATION
I, G. Karthik, declare that the thesis entitled DETECTING NETWORK
INTRUSION BASED ON DATA MINING TECHNIQUES AND ITS
APPLICATION FOR MEDICAL SENSOR NETWORK submitted by me
for the Degree of Doctor of Philosophy is the record of work carried out
by me during the period from 2009 to 2016 under the guidance of
Dr. A. Nagappan, and has not formed the basis for the award of any
degree, diploma, associate-ship, fellowship, titles in this or any other
University or other similar institutions of higher learning.
Place: Salem
Date: Signature of the Candidate
VINAYAKA MISSIONS UNIVERSITY
SALEM
CERTIFICATE BY THE GUIDE
I, Dr. A. Nagappan, certify that the thesis entitled DETECTING
NETWORK INTRUSION BASED ON DATA MINING TECHNIQUES
AND ITS APPLICATION FOR MEDICAL SENSOR NETWORK
submitted for the Degree of Doctor of Philosophy by Mr. G. Karthik, is
the record of research work carried out by him during the period from
2009 to 2016 under my guidance and supervision and that this work
has not formed the basis for the award of any degree, diploma,
associate-ship, fellowship or other titles in this University or any other
University or Institution of higher learning.
Place: Salem
Date: Signature of the Guide
ACKNOWLEDGEMENT
Let me thank God almighty who has been showering His blessings on
me all these days.
I express my gratitude to our Honorable Founder Chancellor, Vinayaka
Missions University, Dr. A. Shanamugasundaram, Madam Founder
Chancellor Mrs. Annapoorani Shanamugasundaram, Chancellor
Dr. A.S. Ganesan and Pro-Chancellor Dato Sri’ Dr. S. Sharavanan,
for permitting me to do this research at VMKV Engineering College.
Firstly, I would like to give thanks to my Supervisor, Dr. A. Nagappan,
Principal, V.M.K.V. Engineering College for this unstinted support and
guidance. I learned a lot from our discussion and his positive attitude
and guidance to motivate me to work hard. One of his suggestions that
I will always remember is “learn from comments and improve your
work.” This simple suggestion is applicable not only to research but
also in other aspects of my life.
My Special thanks to our Vice-Presidents Mr.J.S.Sathishkumar and
Mr.N.V.Chandrasekar, Mr.N.Ramsamy, Director, Mr.K.Jaganathan,
Director, Porf.Dr. V.R.R. Rajendran, Vice Chancellor,Dr. Y. Abraham,
Registrar and Dr. K. Rajendran, Dean (Research), of Vinayaka
Missions University, Salem, and to my colleagues and friends who
have helped me in one way or other in doing this research.Last but not
the least; I thank my parents,wife and my relations who were
supporting me day in and day out during the course of my research.
(G. KARTHIK)
ABSTRACT
Intrusion detection is the challenge to monitor and probably prevent the
attempts to intrude into or otherwise compromise your system and
network resources. One of the recent methods for identifying any
abnormal activities staging in a computer system is carried out by
Intrusion Detection Systems (IDS) and it forms a major portion of
system defence against attacks. The main objectives of this thesis are
to study and analyse different variants of intrusion detection techniques
meant for improving performance and also to design and develop an
efficient approach for Intrusion Detection using Clustering and Hybrid
techniques. The proposed approach is applied on KDD cup-99 dataset
and to evaluate the result to attain high accuracy. In this work some
existing clustering techniques such as K Means Clustering, Fuzzy K
Means Clustering, Fuzzy C-Means and KFCM are discussed and
implemented. To evaluate the performance of the proposed technique,
I used KDD CUP 99 DATASET for testing and evaluation. Based on
the analysis it is observed that the proposed Fuzzy Bisector-Kernel
Fuzzy C-means clustering (FB-KFCM) performs better than other
methods in terms of accuracy which attains an average high accuracy
of 93.91% when compared with other techniques. A Hybrid Intrusion
Detection System using LDA+CS (Linear Discriminant Analysis +
Cuckoo search) is developed by combining LDA and CS. LDA is a
iii
commonly used technique for dimensionality reduction. Fuzzy Bisector-
Kernel Fuzzy C-means clustering (FB-KFCM) is used as the clustering
technique and in this proposed system; the Bayesian Neural Network
is used for better classification. In this work Comparison of the existing
technique such as KFCM + Bayesian network and Fuzzy Bisector-
Kernel Fuzzy C-means clustering (FB-KFCM) + Bayesian network are
compared along with the hybrid technique LDA+CS + FB-KFCM +
Bayesian Network introduced as well as their results are discussed. To
evaluate the performance of the proposed technique, I used KDD CUP
99 DATASET for testing and evaluation. Based on the comparative
analysis the proposed hybrid technique LDA+CS + FB-KFCM +
Bayesian Network attained high accuracy of 98.31%. These values
show the efficiency of the proposed technique by achieving better
accuracy values. Finally, the proposed algorithm is stimulated using
medical sensor networks that consist of totally 8668 data. The
simulation result is obtained for 10 test data. From the result, it is found
that 8 among 10 are not intruded and remaining 2 are intruded. This
attains high accuracy rate and efficiency of the technique introduced
here.
iv
TABLE OF CONTENTS
Chapter No. Title Page
No. ABSTRACT iii
LIST OF TABLES x
LIST OF FIGURES xi
LIST OF SYMBOLS AND ABBREVIATIONS xiii
1 INTRODUCTION 1
1.1 Motivation 1
1.2 Intrusion Detection System 2
1.2.1 Attack motivation and objectives 7
1.2.2 Types of Intrusion Attack 7
1.2.2.1 DOS Attack 8
1.2.2.2 Probe Attack 9
1.2.2.3 U2R 9
1.2.2.4 R2L 10
1.2.3 Details of some Common Attacks 11
1.3 Why we need IDS? 16
1 .3.1 Efficiency of Intrusion Detection Systems 17
1.4 Data mining 19
1.4.1 Data mining Life Cycle 20
1.4.1.1 Define the problem 20
1.4.1.2 Data collection and selection 21
v
1.4.1.3 Data Preprocessing 22
1.5 Types of Databases 22
1.6 Data Mining Applications 25
1.7 Data Mining in Medical Data 28
1.7.1 Problems in Medical Data 29
1.8 Application to Medical Sensor Network 30
1.9 Objectives of the Thesis 32
1.10 Scope of the Thesis 32
1.11 Organization of the Thesis 33
1.12 Summary 36
2 LITERATURE REVIEW 37
2.1 Intrusion Detection System (IDS) 37
2.1.1 Confidentiality 38
2.1.2 Integrity 38
2.1.3 Availability 39
2.2 Classification of intrusion detection systems 40
2.2.1 Intrusion Detection Approach 41
2.2.1.1 Anomaly-Based Detection 42
2.2.1.2 Signature-Based Detection 43
2.2.2 Types of Protected Systems 43
2.2.2.1 Host Based Intrusion Detection 43
2.2.2.2 Network Based Intrusion Detection 48
2.2.2.3 Hybrid Based Intrusion Detection 61
vi
2.3 Structure of IDS 62
2.3.1 Data Source 62
2.3.2 Behavior of an attacker 63
2.3.3 Analysis Timing 64
2.3.3.1 Audit Trail Processing 64
2.3.3.2 On-Fly Processing 66
2.4 IDS Data Processing Techniques 67
2.4.1 Expert systems 67
2.4.2 Signature analysis 67
2.4.3 Colored Petri Nets 68
2.4.4 State-Transition Analysis 68
2.4.5 Statistical Analysis Approach 69
2.4.6 Neural Networks 69
2.4.7 User Intention Identification 70
2.4.8 Computer Immunology 71
2.5 Data mining Theoretical background 71
2.5.1. Data mining and Knowledge discovery 75
2.5.2. History of data mining 78
2.5.3. Data mining functionality 81
2.6 Evaluation of Datasets 88
2.7 Feature Selection 94
2.8 Summary 101
vii
3 METHODOLOGY& DATABASE 102
3.1 The DARPA Intrusion-Detection Evaluation Program 102
3.2 Attack Types in the 1999 DARPA Data Set 104
3.2.1 Different Attack Types 105
3.2.2 Attack Descriptions 107
3.3 Data-Set Description 110
3.3.1 Set of Features used in the Connection Records 111
3.4 Feature Extractions and Preprocessing 118
3.4.1 Normalization 119
3.5 Performance Evaluation Metrics 120
3.6 Summary 122
4 CLUSTERING BASED INTRUSION DETECTION 123
4.1 Introduction 123
4.2 Need for Clustering of data 123
4.3 Clustering Algorithms 124
4.3.1 K Means Clustering 125
4.3.2 Fuzzy K Means Clustering 127
4.3.3 Fuzzy C-Means 130
4.3.4 KFCM 131
4.3.5 Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM) 133
4.4 Classification Module 138
4.4.1 Neural Network 138
4.4.2 Bayesian Neural Network 140
viii
4.5 Results and Discussions 142
4.6 Summary 150
5 HYBRID INTRUSION DETECTION SYSTEM 151
5.1 Introduction 151
5.2 Need for Hybrid Approach 152
5.3 Application of Hybrid Approach 154
5.4 Locality Preserving Cuckoo search Algorithm 155
5.4.1 Training Phase 157
5.4.1.1 Initialization 158
5.4.1.2 Fitness Calculation and Nest update 159
5.5 Clustering using FB-KFCM 164
5.6 Classification using Bayesian Neural Network 167
5.7 Summary 171
6 RESULTS AND IMPLEMENTATION 173
6.1 Comparative Analysis 180
6.2 Implementation in Medical Sensor Network 181
6.3 Summary 183
7 CONCLUSION 185
7.1 Contributions 186
7.2 Future Works 188
REFERENCES 189
LIST OF PUBLICATIONS 206
ix
LIST OF TABLES
Table No. Title Page No.
Table 3.1 Class Labels that Appears in Full KDDCUP99 Dataset
108
Table 3.2 Class Labels that Appears in 10% KDDCUP99 Dataset
109
Table 3.3 KDDCUP99 Basic Features of Individual TCP Connections
110
Table 3.4 Content Features within a Connection Suggested by Domain Knowledge
110
Table 3.5 Traffic Features Computed Using a Two-second Time Window
111
Table 3.6 Traffic Features computed using a Hundred- second connection windows
112
Table 4.1 Accuracy table for Case 8:2 137
Table 4.2 Accuracy table for Case 7:3 138
Table 4.3 Accuracy table for Case 9:1 138
Table 4.4 Average Accuracy Table 140
Table 4.5 Comparative Analysis 143
Table 6.1 Attack Distribution in KDD full, KDD 10% and KDD Corrected dataset.
166
Table 6.2 Accuracy for 8:2 167
Table 6.3 Accuracy for 7:3 167
Table 6.4 Accuracy for 9:1 168
Table 6.5 Average Accuracy Table 170
x
LIST OF FIGURES
Figure No. Title Page No.
Figure 1.1 Simple Intrusion Detection System 3
Figure 1.2 Types of intrusion attack 8
Figure 1.3
Figure 1.4
Data mining life cycle Medical Sensor Network architecture
21
31
Figure 2.1 Intrusion Detection System Classification and Processing
38
Figure 2.2 Behavior of the user in the system 59
Figure 2.3 KDD process model 72
Figure 2.4 Data Mining and Associated Fields 73
Figure 2.5 Data mining functionalities 77
Figure 2.6 Classification using decision tree 80
Figure 2.7 Clustering 81
Figure 2.8 Outlier Analysis 82
Figure 4.1 Input mono dimensional data 122
Figure 4.2 Clustered using k means 123
Figure 4.3 Clustered Using Fuzzy K Means 123
Figure 4.4 Illustration of FB-KFCM clustering technique 128
Figure 4.5 Block diagram of the Neural Network 133
Figure 4.6 Accuracy Plot for Case 8:2 139
Figure 4.7 Accuracy Plot for Case 7:3 139
Figure 4.8 Accuracy Plot for Case 9:1 140
xi
Figure 4.9 Average Accuracy Plot 141
Figure 4.10 Accuracy plot for Comparative Analysis 143
Figure 5.1 Proposed Intrusion Detection System 147
Figure 5.2 Fixed Nest 149
Figure 5.3 Nest formation from original dataset 150
Figure 5.4 LDA-CS Flow Diagram 154
Figure 5.5 FB-KFCM 158
Figure 5.6 Bayesian Neural Network Classifier(BNNC) 160
Figure 6.1 Accuracy Plot for Case 8:2 168
Figure 6.2 Accuracy Plot for Case 7:3 169
Figure 6.3 Accuracy Plot for Case 9:1 169
Figure 6.4 Average Accuracy Plot 170
Figure 6.5 Simulation Result Obtained for time T1, T2, T3,T4 173
xii
LIST OF SYMBOLS AND ABBREVATIONS
ADC Approximate Distance Clustering
AFRL Air Force’s Research Laboratory
ARIS Attack Registry and Intelligence Service
BN Bayesian Network
BNNC Bayesian Neural Network Classifier
C.I.A Confidentiality Integrity and Availability
CID Consensus Intrusion Database
CS Cuckoo Search
DAG Directed Acyclic Graph
DARPA Defence Advanced Research Projects Agency
DB Distance Based
DCost Damage Cost
DIDS Distributed Intrusion Detection System
DL Description Length
DLCF Dynamic Learning Classifier Framework
DM Data Mining
DOS Denial of Service
DR Detection Rate
e-kNN Extension to k-Nearest Neighbour
FAR Failure Analysis Rate
FB-KFCM Fuzzy Bisector-Kernel Fuzzy C-means clustering
xiii
FNR False Positive Rate
FP False Positive
GrIDS Graph based IDS
HIDS Host based Intrusion Detection System
HMM Hidden Markov Model
HYBRID IDS Hybrid Intrusion Detection System
ID Intrusion Detection
ID3 Induction Decision version 3
IDES Intrusion Detection Expert System
IDIOT Intrusion Detection In Our Time
IDS Intrusion Detection System
IDT Induction Decision Tree
IES Information Exploration Shootout
ISC Internet Storm Centre
ISOA Information Security Officer’s Assistant
ISS Internet Security Systems
KDD Knowledge Discovery in Databases
KDDCUP'99 Knowledge Discovery in Databases Dataset 1999
KFCM Kernel Fuzzy C-means clustering
kNN k-Nearest Neighbour
kRD k-Relative Distance
LDA Linear Discriminant Analysis
xiv
LVQ Learning vector quantization
MFDO Multistage Framework to Detect Outliers
ML Machine Learning
MSE Mean Square Error
NFR Network Flight Recorder
NID Network Intrusion Detection
NIDES Network Intrusion Detection Expert System
NIDS Network based Intrusion Detection System
NRBC New Rule Based Classification
PC Probabilistic Cardinality
R2L Remote to Local
RCost Response Cost
ROC Receiver Operating Characteristic
RS Rule set
SCACC Storm Centre Analysis and Coordination Centre
SRSWR Simple Random Sample Without Replacement
TCP Transmission Control Protocol
TN True Negative
TP True Positive
TPR True Positive Rate
U2R User to Root
XML Extended Mark-up Language
xv
CHAPTER 1
INTRODUCTION
1.1 Motivation
Due to the popularization of the Internet and local networks,
intrusion events to computer systems are growing [150]. Because of
increased network connectivity, computer systems are becoming
increasingly vulnerable to attack. The general goal of such attacks is to
subvert the traditional security mechanisms on the systems and
execute operations in excess of the intruder's authorization. These
operations could include reading protected or private data or simply
doing malicious damage to the system or user files [110]. By building
complex tools, which continually monitor and report activities, a system
security operator can catch potentially malicious activities as they
occur. Intrusion detection systems are becoming increasingly important
in maintaining proper network security [5, 29 and 150].
A good intrusion detection system should be able to distinguish
between normal and abnormal user activities. This would include any
event, state, content, or behaviour that is considered to be abnormal by
a pre-defined standard [52]. It is very important for IDSs to generate
rules to distinguish normal behaviours from abnormal behaviour by
1
observing dataset, which is the record of activities generated by the
operating system that are logged to a file in chronological order [46].
Intrusion detection has received a lot of interest among the
researchers due to the rapid development and popularization of the
Internet and local networks. A good intrusion detection system should
be able to differentiate between normal and abnormal user activities. It
is very important to generate rules to distinguish normal behaviours
from abnormal behaviour. Though lot of techniques and tools are
available, more research is needed to develop good system for
intrusion detections.
1.2 Intrusion Detection System
An intrusion detection system acquires information about an
information system to perform a diagnosis on the security status of the
later. The goal is to discover breaches of security, attempted breaches,
or open vulnerabilities that could lead to potential breaches. A typical
intrusion detection system is shown in Figure 1.1.
An intrusion-detection system can be described at a very
macroscopic level as a detector that processes information coming
from the system to be protected. This detector can also launch probes
to trigger the audit process, such as requesting version numbers for
2
applications. It uses three kinds of information: long-term information
related to the technique used to detect intrusions (a knowledge base of
attacks, for example), configuration information about the current state
of the system, and audit information describing the events that are
happening on the system.
Figure 1.1 Simple Intrusion Detection Systems
The role of the detector is to eliminate unneeded information
from the audit trail. It then presents either a synthetic view of the
security-related actions taken during normal usage of the system, or a
synthetic view of the current security state of the system. A decision is
then taken to evaluate the probability that these actions or this state
3
can be considered as symptoms of an intrusion or vulnerabilities. A
counter measure component can then take corrective action to either
prevent the actions from being executed or change the state of the
system back to a secure state.
Intrusion Detection Systems (IDSs) are usually deployed along
with other preventive security mechanisms, such as access control and
authentication, as a second line of defense that protects information
systems. There are several reasons that make intrusion detection a
necessary part of the entire defense system. First, many traditional
systems and applications were developed without security in mind. In
other cases, systems and applications were developed to work in a
different environment and may become vulnerable when deployed
Intrusion detection complements these protective mechanisms to
improve the system security. Moreover, even if the preventive security
mechanisms can protect information systems successfully, it is still
desirable to know what intrusions have happened or are happening, so
that we can understand the security threats and risks and thus be
better prepared for future attacks.
The attack can be launched in term of fast attack or slow attack.
Fast attack can be defined as an attack that uses a large amount of
packet or connection within a few second [43]. Meanwhile, slow attack
4
can be defined as an attack that takes a few minutes or a few hours to
complete [43]. Both of the attack gives a great impact to the network
environment due to the security breach decade. Currently IDS is used
as one of the defensive tools in strengthening the network security
especially in detecting the first two phases of an attack either in form
slow or fast attack An intrusion detection system can be divided into
two approaches which are behavior based (anomaly) and knowledge
based (misuse) [26], [19]. The behavior based approach is also known
as anomaly based system while knowledge based approach is known
as misuse based system [151], [45].The misuse or signature based
IDS is a system which contains a number of attack description or
signature that are matched against a stream of audit data looking for
evidence of modeled attack [19]. The audit data can be gathered from
network traffic or an application log. This method can be used to detect
previous known attack and the profile of the attacker has to be
manually revised when new attack types are discovered. Hence,
unknown attacks in network intrusion pattern and characteristic might
not be captured using this technique [125].Meanwhile, the anomaly
based system identifies the intrusion by identifying traffic or application
which is presumed to be normal activity on the network or host.
5
The anomaly based system builds a model of the normal
behavior of the system and then looks for anomalous activity such as
activities that do not confirm to the established model. Anything that
does not correspond to the system profile is flagged as intrusive. False
alarms generated by both systems are major concern and it is
identified as a key issues and the cause of delay to further
implementation of reactive intrusion detection system [78].
Therefore, it is important to reduce the false alarm generated by
both of the systems. Although false alarm is a major concern in
developing the intrusion detection system especially the anomaly
based intrusion detection system, yet the system has fully met the
organizations’ objective compared to the signature based system [50].
The false positive generated by the anomaly based system is still
tolerable even though expected behavior is identified as anomalous
while false negative is in tolerable because they allow attack to go
undetected. An attack that uses a large amount of packet or
connection within a few second scanning attacks, DOS attack and
worm attack are some of fast attacks, Code Red Worm and NIMDA
worm are another breed of DOS attacks on Internet infrastructure after
the Morris Worm. Code Red Worm has a fast rate of propagation and
infection via network scanning to detect and automatically exploit.
6
1.2.1 Attack motivation and objectives
An intrusion attack [3] is realization of threat, the harmful action
aiming to target and exploit the system vulnerability. Computer attacks
may involve unauthorized access, destroying data; threaten the
security of computer or degrading its performance. Computer and
network attacks have evolved greatly over the last few decades. The
attacks are increasing in number and also improving in their strength
and erudition.
Attack motivation can be understood by identifying what the
attackers do. The main motivation of an attacker is to access to a
system or data; the main motivation of the criminal is to get financial
benefit. Other motivation factors are social, political gain. Mischievous
human tendency is also motivate of attack. The potential threat of
cyber terrorism becoming inevitable due to the critical infrastructures
that is potentially vulnerable [77] [84]. It is easy to attack due to growth
of network.
1.2.2 Types of Intrusion Attack
Intrusion attack [72] [99] can be categorized into four major types
DOS, Probe, U2R, R2L.figure 1.2 shows types of attacks.
7
1.2.2.1 DOS Attack
In a denial of service [77] attack, an attacker makes a resource
on a network either unavailable to justifiable user. DOS attacks make
system processes very busy and occupied with unwanted, unidentified
processes. It attacks on the resource like network bandwidth, computer
memory or computing power. There are many different types of DOS
attacks. For example attack can deny access to a machine on, a
network. The DOS attacks [146] [148] are meant to force the target to
stop the service(s) that is (are) provided by flooding it with probes
illegitimate requests.
Figure 1.2 Types of intrusion attack
8
1.2.2.2 Probe Attack
Probe attacks [84] are often the first step of all other attacks.
Probe attacks are used to collect information about the targeted
computer network or a definite machine on computer network. Network
probes are most important for attacker because through this only they
find vulnerabilities present on his target machine or network. That is
the reason why it is critical to detect this type of attacks. Mostly all
administrators use probe to check machines on a network, and so it is
difficult to detect which one is legitimate user and which one is
attacker. So it is also difficult to distinguish attacks from regular
actions. The probe attacks are meant to obtain information about the
target network from a source that is usually external to the targeted
network. Probing is an attack in which the hacker scans a machine or a
networking device in order to determine weaknesses or vulnerabilities
that may later be exploited so as to compromise the system.
1.2.2.3 U2R
The U2R [84] attacks are difficult to arrest because it involve the
semantic details that are very difficult to capture at an early stage.
Initially attacker starts off on the system with a normal user account
and then tries to get super user privileges rules by abusing
vulnerabilities. In a User to Root attack, an attacker starts a session on
9
a computer as a normal user with restricted rights and by exploiting
some vulnerability on the software installed on the system, the user
can raise his privilege. The purpose of this class of attack is obviously
to obtain administrator rights on the attacked computer in order to have
full control over it. There are several different types of U2R attacks.
Buffer overflow is undoubtedly the major vulnerability used by hackers
when trying to obtain privileged rights on a computer.
1.2.2.4 R2L
Most challenging attacks are R2L attacks [77] and they are very
difficult to detect because they involve the network level and the host
level features. A remote to user attack is an attack in which a user
sends packets to a machine through the internet, which attacker does
not have access to in order to expose the machines vulnerabilities and
exploit privileges which a local user would have on the computer. In a
Remote to Local attack, the attacker starts from a session on a
computer outside of the targeted network and exploits vulnerability in
order to gain access to a computer on the local network. A precondition
that must be fulfilled is the ability for the attacker to send network
packets to the victim host. Usually, but not always, Remote to Local
attacks are combined with U2R attacks permitting the attacker to get
10
full access of a remote machine which is part of an other network than
the network of the attacker.
1.2.3 Details of some Common Attacks
• Back - This attack is initiated against an apache Web server,
which is flooded with requests containing a large number of
fronts-slash (/) characters in the URL description. As the server
tries to process all these requests, it becomes unable to process
other genuine requests and hence, it denies service to its
customers.
• Smurf Attack - In a ‘smurf’ attack is a type of DOS attack. In this
attack many ICMP echo-reply packets are bombarded on
attacked machine. This attack throw many ICMP echo request
packets to the broadcast address of many subnets every
machine that belongs to any of these subnets responds by
sending ICMP ‘echo-reply’ packets to the victim. These packets
contain the victim's address as the source IP address. Smurf
attacks are very hazardous, because they are strongly
distributed attacks.
• Teardrop - Many times a packet is broken into smaller
fragments while travelling from the source machine to the
destination machine. A Teardrop attack creates a stream of IP
11
fragments with their offset field overloaded. The destination host
that tries to reassemble these malformed fragments eventually
crashes or reboots.
• Land - The Land a very common DOS (Denial of Service) attack
works by sending a spoofed packet with the SYN flag - used in a
‘handshake’ between a client and a host – set from a host to any
port that is open and listening. If the packet is programmed to
have the same destination and source IP address, when it is
sent to a machine, via IP spoofing, the transmission can fool the
machine into thinking that it is sending itself a message, which,
depending on the operating system, will crash the machine.
• Neptune (SYN Flood) - Neptune (SYN Flood) is an attack to
which every TCP/IP implementation is vulnerable. Each half-
open TCP connection made to a machine causes the 'tcpd'
server to add a record to the data structure that stores
information describing all pending connections. The data
structure which is used for this work is of finite size, and it can be
made to overflow by intentionally creating too many partially-
open connections. The half-open connections data structure on
the victim server system will eventually fill and the system will be
unable to accept any new incoming connections until the table is
emptied out.
12
• Ping of Death (POD) - Ping of Death attacks is the DOS attack
in which attacker creates a packet of size more than IP protocol
limit (more than 65,536 bytes). This packet can cause different
kinds of damage like rebooting and crashing of the machine that
receives it.
• Port sweep - A port sweep attack scans multiple hosts for one
port. For example port 80 is usually scanned for all the
addresses in a 24 bit address space. Port sweep is for one
listening port scanning multiple hosts. It searches for a specific
service, like SQL based computer worm may port sweep looking
for hosts listening on TCP port.
• NMAP - Nmap is the type of port scanner. Nmap has a large list
of parameters and performs following:
Host discovery – Identifying hosts on a network. For
example, listing the hosts that respond to pings or have a
particular port open.
Port scanning – Enumerating the open ports on target
hosts.
Version detection – Interrogating network services on
remote devices to determine application name and
version number.
13
OS detection – Determining the operating system and
hardware characteristics of network devices.
Scriptable interaction with the target – using Nmap
Scripting Engine (NSE) and Lu a programming language.
Nmap can provide further information on targets, including
reverse DNS names, device types, and MAC addresses.
• SATAN - SATAN (Security Administrator Tool for Analyzing
Networks) remotely probes systems through the network. Satan
stores its findings in a database. SATAN is a publicly available
tool that probes a network for security vulnerabilities and mis-
configurations. It is created to be used by administrators but
often used by attackers to search for vulnerabilities on a network.
Information provided by SATAN could be useful to an attacker in
performing an attack. Internet community uses a share ware
version of SATAN extensively. SATAN collects data from the
named hosts that it discovers while probing a primary host. A
primary target can be a host name, a host address, or a network
number. SATAN can generate reports of hosts by type, service,
and vulnerability and by trust relationship. It also gives details of
vulnerabilities and way to handle and remove them.
• phf Attack - A script named ‘phf’ can be. The legitimate use of
the phf script is to update the people directory, which is installed
14
by default in the cgi-bin directory. It is used to perform an attack
on the web server many times .The script’s behavior changes if
used with the ‘0a’ character in the URL when calling the script.
To perform an attack, the attacker appends ‘0a’ to the URL along
with some other UNIX command.
• Buffer overflows - There were four buffer overflow attacks
namely eject, fdformat, ffbconfig, and ps programmes. The
attacks on the first three programmes exploited a buffer over flow
condition to execute a shell with root privileges. The specification
used to monitor set uid to root programmes could easily detect
these attacks by detecting oversized arguments and the
execution of a shell. The ps attack was significantly more
complex than the other three buffer overflow attacks. For one
thing, it used a buffer overflow in the static area, rather than the
more common stack buffer overflow. Thus, it is difficult to detect.
Second, instead of shell program it used a chmod system call to
effect damage. Chmod operation is itself unusual, and it is not
permitted by generic specification (except on certain files).
• Ftp-write attack - The ftp-write attack is a R2L (remote to local)
user attack that takes advantage of a common anonymous ftp
misconfiguration. The ftp directory and its sub directories should
not be owned by the ftp account or be in the same group as the
15
ftp account. If any of these directories are owned by ftp or are in
the same group as the ftp account and are not write protected,
an intruder will be able to add files and eventually gain local
access to the system. This attack is easy to attack due to the
site-specific policy that no file could be written in ftp directory.
• Warez attacks - There are two types of warez attacks; warez
master and warez client. Warez master attack logs into an
unidentified FTP site and creates a file or a hidden directory. In
warez client attack, the file previously down loaded by the warez
master is uploaded. This attack could be easily captured by the
specifications which encoded the site specific policy of
disallowing any writes to the FTP directory.
1.3 Why we need IDS?
To answer this question, we need to understand why intruders
can get into the system.
There are various reasons of which the prominent ones are:
• Software bugs – they can be buffer overflows, unexpected
combinations, unhandled inputs, race conditions etc. Software
has bugs because programmers cannot track down and
eliminate all possible holes.
16
• Password Cracking – hackers have over the time developed
numerous ways to break into systems by knowing passwords
that were really weak, or by making dictionary & brute force
attacks.
• Design flaws – many systems that were developed early were
never designed to handle the wide scale intrusion that is there
today. These include TCP/IP protocol flaws, operating system
flaws etc.
• Sniffing unsecured traffic – traffic on the Internet is not
encrypted. Hackers can use programmers that can get sensitive
information from packets over the network. These include the
packet sniffers, port scanners etc.
A firewall cannot always handle attacks directed to exploit these
flaws. Hence, we require IDS which can logically complement the
firewall.
1.3.1 Efficiency of Intrusion Detection Systems
To evaluate the efficiency of an intrusion-detection system,
Porras and Valdes [116] have proposed the following parameters:
• Accuracy - Accuracy deals with the proper detection of attacks
and the absence of false alarms. Inaccuracy occurs when an
17
intrusion detection system flags a legitimate action in the
environment as anomalous or intrusive.
• Performance - The performance of an intrusion-detection
system is the rate at which audit events are processed. If the
performance of the intrusion-detection system is poor, then real-
time detection is not possible.
• Completeness - Completeness is the property of an intrusion-
detection system to detect all attacks. Incompleteness occurs
when the intrusion-detection system fails to detect an attack.
This measure is much more difficult to evaluate than the others
because it is impossible to have a global knowledge about
attacks or abuses of privileges.
• Fault Tolerance - An intrusion-detection system should itself be
resistant to attacks, especially denial-of-service type attacks, and
should be designed with this goal in mind. This is particularly
important because most intrusion-detection systems run above
commercially available operating systems or hardware, which
are known to be vulnerable to attacks.
• Timeliness - An intrusion-detection system has to perform and
propagate its analysis as quickly as possible to enable the
security officer to react before much damage has been done,
and also to prevent the attacker from subverting the audit source
18
or the intrusion-detection system itself. This implies more than
the measure of performance because it not only encompasses
the intrinsic processing speed of the intrusion-detection system,
but also the time required to propagate the information and react
to it.
1.4 Data mining
Data mining is a process that uses a variety of data analysis
tools to discover patterns and relationships in data that may be used to
make valid predictions [62]. To mine the hidden and useful information
we have to take the available dataset through the process of data
mining. It’s not a single step. It contains various groups of inter linking
steps which will help us to find the useful information for decision
making. Data mining searches databases to find hidden patterns and
predict information to increase the business in the organization.
Data mining is the non trivial extraction of implicit, previously
unknown, interesting and potentially useful information from data. Now
a day’s hospitals and health care institutions are well equipped with
monitoring and other data collection devices, where data is collected
and shared with other hospital information systems. Separated hospital
database or information system is now integrated as a large-scale
information system. The increase in data volume causes difficulties in
19
extracting useful information for decision support. In the time of
medical diagnosis, using data mining we can extract useful information
from large collection of patients data that can be used as a valuable
resource for the decision making process.
Classification, clustering, prediction, association, rule extraction
and sequence detection are the various types of problems we can
solve through data mining. The techniques used in data mining are
from different fields like statistics, machine learning and pattern
recognition. It includes statistical methods, case based reasoning,
neural network, decision trees, rule induction, Bayesian networks,
fuzzy sets, rough sets and genetic algorithms.
1.4.1 Data mining Life Cycle
We have to do the following steps to solve a data mining
problem [9] Selecting an appropriate data mining method. Training and
testing the selected data mining model. Final integration and evaluation
of the generated model. It is represented as a diagram in the Fig 1.3.
1.4.1.1 Define the problem
To have the successful data mining application, the organization
has to come up with a precise formulation of the problem they are
20
trying to solve. A focused problem statement usually results in the best
pay off.
Figure 1.3. Data mining life cycle [9]
1.4.1.2 Data collection and selection
The organization has to use the right data for mining. data
collection and selection step identifies the related data sources and
acquires it. From the collected data source data selection process
selects the subset of data to mine.
1. Define the Problem
2. Collect/Select Data
3. Data Pre-processing
4.Model Selection
5.Training /Testing the Model
6. Final Evaluation/Integration of the model
Iteration
21
1.4.1.3 Data Pre-processing
• Data cleaning - It fills in the missing data and correcting the
invalid data into a valid one. It finds the outliers data and
removes the inconsistencies in the data source.
• Data integration - It combines data from different data sources
into a single mining database.
• Data transformation - It converts the source data into a common
format for processing.
• Data reduction - It is a process of discarding unwanted
parameters from the data. So that the data volume will be less at
the same time it will not suffer on the quality of the information.
• Data discretization - It is a part of data reduction process. It
replaces the numerical attributes with the nominal attributes.
1.5 TYPES OF DATABASES
Data mining is not specific to any kind of data. Zaiane [44] claims
that data mining should be applicable to any kind of information
repository. But the challenges of mining posed by different kinds of
data vary significantly.
• Flat Files - Flat files containing text or binary data are the most
obvious candidates for data mining. Mining of text data is
22
referred to as text mining. It generally entails analyzing a large
volume of textual data to ascertain correlations or other patterns.
In the domain of software engineering, the mining of source code
is generally performed using text mining techniques. Software
requirement specifications and test case documents containing
textual information are attractive candidates for text mining.
• Relational Databases - Relational databases containing
information structured as tables where each row is termed as a
tuple and each column as an attribute provide excellent support
for several data mining algorithms. Data mining algorithms that
target relational databases are more versatile than those for flat
files [44]. Structured Query Language (SQL) is the standard
language for accessing relational databases, and data mining
algorithms can also leverage the capabilities of SQL for data
transformation and consolidation.
• Data Warehouses - Data warehouses are structured
repositories of data from multiple, heterogeneous sources. Data
warehouses facilitate analysis of data from different dimensions.
A data cube facilitates analysis of data along multiple dimensions
and each cell usually contains the value of some aggregate
measure. As Zaiane[44] states, because of their structure, the
pre-computed summarized data they contain and the hierarchical
23
attribute values of their dimensions, data cubes are well-suited
for fast interactive querying and analysis of data at different
conceptual levels, known as On-Line Analytical Processing
(OLAP). OLAP operations allow the navigation of data at
different levels of abstraction, such as drill-down, roll-up, slice,
dice etc.
• Transaction Databases - A transaction database contains
information pertaining to day-to-day transactions including a time
stamp, identifier and the associated items. Transaction
information is generally stored in flat files or in two normalized
relational tables - one containing the transactions and the other
containing the transaction items. A typical example for the
scenario is the market-basket analysis that attempts to track
transactions that occur together or in a sequence.
• Multimedia Databases - Mining of multimedia data such as
audio, video and graphics stored on a flat file or object-oriented
or object-relational databases is even more challenging due to
the high dimensionality of the involved data. This may entail
application of techniques from computer vision and computer
graphics.
• Spatial Databases - A spatial database stores a large amount of
space-related data, such as maps, pre-processed remote
24
sensing or medical imaging data. They carry topological and
distance information, usually organized by sophisticated,
multidimensional spatial indexing structures.
• Time-Series Databases - Time-series databases containing
time-related information like market share prices have a
continuous flow of data feeds that presents novel challenges,
and the mining of these databases entails evolution analysis and
trend prediction.
• World Wide Web-The World Wide Web (WWW) is a huge
repository of information, and mining this is commonly classified
into - Web Content Mining – which encompasses the documents,
Web Structure Mining - which focuses on the hyperlinks and
relationships between documents and Web Usage Mining which
focuses on the usage patterns of web pages. Web Mining can
greatly enhance the usability of the WWW.
1.6 Data Mining Applications
• Medical Data Mining - Over the past decade, nudged by new
federal regulations, hospitals and medical offices around the
country have been converting scribbled doctors’ notes to
electronic records. Although the chief goal has been to improve
efficiency and cut costs [106].
25
• Spatial Data Mining - Spatial data mining is the application of
data mining methods to spatial data. The end objective of spatial
data mining is to find patterns in data with respect to geography.
So far, data mining and Geographic Information Systems (GIS)
have existed as two separate technologies, each with its own
methods, traditions, and approaches to visualization and data
analysis. Particularly, most contemporary GIS have only very
basic spatial analysis functionality. The immense explosion in
geographically referenced data occasioned by developments in
IT, digital mapping, remote sensing, and the global diffusion of
GIS emphasizes the importance of developing data driven
inductive approaches to geographical analysis and modelling
[59,57].
• Sensor Data Mining - Wireless sensor networks can be used for
facilitating the collection of data for spatial data mining for a
variety of applications such as air pollution monitoring. A
characteristic of such networks is that nearby sensor nodes
monitoring an environmental feature typically registers similar
values. This kind of data redundancy due to the spatial
correlation between sensor observations inspires the techniques
for in-network data aggregation and mining. By measuring the
spatial correlation between data sampled by different sensors, a
26
wide class of specialized algorithms can be developed to
develop more efficient spatial data mining algorithms [83].
• Visual Data Mining - In the process of turning from analogical
into digital, large data sets have been generated, collected, and
stored discovering statistical patterns, trends and information
which is hidden in data, in order to build predictive patterns.
Studies suggest visual data mining is faster and much more
intuitive than is traditional data mining [114].
• Music Data Mining - Data mining techniques, and in particular
co-occurrence analysis, has been used to discover relevant
similarities among music corpora (radio lists, CD databases) for
purposes including classifying music into genres in a more
objective manner [54].
• Pattern Mining - "Pattern mining" is a data mining method that
involves finding existing patterns in data. In this context patterns
often means association rules. The original motivation for
searching association rules came from the desire to analyze
super market transaction data, that is, to examine customer
behaviour in terms of the purchased products. For example, an
association rules "beer Potato chips (58)" states that four out of
five customers that bought beer also bought potato chips. In the
context of pattern mining as a tool to identify terrorist activity, the
27
National Research Council provides the following definition:
"Pattern-based data mining looks for patterns (including
anomalous data patterns) that might be associated with terrorist
activity these patterns might be regarded as small signals in a
large ocean of noise"[107][65]. Pattern Mining includes new
areas such a Music Information Retrieval (MIR) where patterns
seen both in the temporal and non-temporal domains are
imported to classical knowledge discovery search methods.
• Subject-based Data Mining - "Subject based data mining" is a
data mining method involving the search for associations
between individuals in data. In the context of combating
terrorism, the National Research Council provides the following
definition: "Subject-based data mining uses an initiating
individual or other datum that is considered, based on other
information, to be of high interest, and the goal is to determine
what other persons or financial transactions or movements, etc.,
are related to that initiating datum" [4].
1.7 Data Mining in Medical Data
Modern medicine generates large amount of information stored
in the medical database. It is necessary to extract useful knowledge
and providing scientific decision-making for the diagnosis and
28
treatment of disease from the database increasingly becomes
necessary. Data mining in medicine can deal with this problem. It can
also improve the management quality of hospital information and
promote the development of telemedicine and community medicine.
Because the medical information is characteristic of redundancy, multi-
attribution, incompletion and closely related with time, medical data
mining differs from other one. In this paper we have discussed the key
techniques of medical data mining involving pre-treatment of medical
data, fusion of different pattern and resource, fast and robust mining
algorithms and reliability of mining results. The methods and
applications of medical data mining based on computation intelligence
such as artificial neural network, fuzzy system, evolutionary algorithms,
rough set, and association rules have been introduced [153][106].
1.7.1 Problems in Medical Data
Extensive amounts of knowledge and data stored in medical
database need us to develop specialized tools for accessing, data
analysis, knowledge discovery and effective use of stored knowledge
and data, Because of the increase of data volume results in difficulties
in extracting useful information for decision support.
The traditional manual data analysis has become insufficient.
Important issues that result from the rapidly emerging inclusive of data
29
and information are the provision of standard in terminology,
vocabularies and formats to support multi-liguity and sharing of data.
• Standards for the abstraction and visualization of data.
• Integration of heterogeneous types of data including image and
signals ...etc.
• Standards for interfaces between different resources of data.
• Reusability of data, knowledge and tools.
Many of the environments still lacks standards that impede the
use and analysis of data on a wide range of global data, limiting this
application to data sets collected for specific diagnostic, screening,
prognostic, monitoring, therapy support or other patient management
purposes [110].
1.8 Application to Medical Sensor Network (MSN)
In MSN, a mobile patient can communicate with hospital data center
and/or physician through wireless networks (e.g., cellular and sensor
networks). During the communication, a large amount of data should
be delivered through the intermediate sensor nodes. Medical sensor
data is very important because it is the data about patients’ health.
When any kind of attack exists, it should be detected quickly and
correctly. Otherwise, MSN will be collapsed.
30
Figure 1.4 Medical Sensor Network architecture
Figure 1.4 shows the general architecture of MSN. Among the
processes, Data Relay by Sensor nodes can be influenced by various
attacks. Because of the basic characteristics of MSN it is not easy to
supervise the nodes if they operate properly or not. Especially, in
sensor network, nodes can be added or removed randomly. In this
environment, proposed attack classification through unsupervised
learning data mining mechanisms is proper to apply. When some
attacks are detected, we can replace the nodes under attack or take
proper measures to the nodes, and make the data communication
reliable. It is believed that because MSN allows patients to do their
daily activities while they are monitored continuously anytime,
Data sending by Body Sensor
Aggregation by Personal Device
Data Relay by Sensor Nodes
Data Processing and Reaction by Health Care Center
31
anywhere, the proposed unsupervised learning mechanism for attack
detection mechanism is well applied.
1.9 Objectives of the Thesis
The objectives of the work are defined as below
1. To study and analyze different variants of intrusion detection
techniques meant for improving performance in Medical Sensor
network.
2. To design and develop an efficient approach for Intrusion Detection
using Clustering and Hybrid techniques.
3. To analyze the proposed approach on KDD cup-99 dataset and to
evaluate the result to attain high accuracy.
1.10 Scope of the thesis
The main intention of this research is to develop a network intrusion
detection system by utilizing data mining and artificial intelligence
techniques. Recently intrusion detection systems are designed to
classify attacks by incorporating enhanced rules as learnt from the
network behaviour [19] based on fuzzy class association rule mining
method and genetic network programming (GNP) [46]. In this research
a hybrid method is proposed for intrusion detection using Linear
32
Discriminant Analysis + Cuckoo search+ Fuzzy Bisector-Kernel Fuzzy
C-means clustering and Bayesian neural network.
1.11 Organization of the Thesis
This thesis comprises seven chapters. Chapter 1 introduces the
concept of IDS and motivation. Principle of Data Mining, classification
of data and field of applications are also discussed. The objectives and
scope of the thesis are also presented.
In Chapter 2, Literature reviews based on previous works are
discussed. Classification of intrusion detection systems, Types of
Protected Systems, IDS Data Processing Techniques, Data mining and
Knowledge discovery, Evaluation of Datasets and Feature Selection
are also discussed. Advantages and limitations for the previous works
will also be discussed.
Chapter 3 explains about the database used in this thesis,
Proposed Feature Extractions and Pre-processing techniques and
performance evaluation metrics of Intrusion Detection System are
presented.
In Chapter 4 some existing clustering techniques such as K
Means Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and
KFCM are discussed and implemented. A proposed Fuzzy Bisector-
33
Kernel Fuzzy C-means clustering technique (FB-KFCM) and their
performances are discussed.
Proposed work
Introduction on MSN • Problem
identification
• Object of the scope
Related work on MSN
• Methodology &database
• Dataset description
Clustering based intrusion detection
Hybrid intrusion detection system
Results and Implementation
34
The Hybrid Intrusion Detection System using LDA+CS (Linear
Discriminant Analysis + Cuckoo search) is developed by combining
LDA and CS. Fuzzy Bisector- Kernel Fuzzy C-means clustering (FB-
KFCM) is used as the clustering techniques and in this proposed
system the Bayesian Neural Network is used for better classification
are discussed in Chapter 5.
Comparison of the existing technique such as KFCM + Bayesian
network and Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-
KFCM) + Bayesian network are compared along with the proposed
hybrid technique LDA+CS + FB-KFCM + Bayesian Network their
results are discussed. Hybrid combinations are proposed to achieve
higher accuracy and reliability are discussed in Chapter 6.
Chapter 7 concludes the design methodology for IDS to achieve
higher accuracy and reliability. The result obtained and suggestions for
future development to achieve higher accuracy in IDS are also
discussed.
35
1.12 Summary
Intrusion detection has received a lot of interest among the
researchers due to the rapid development and popularization of the
Internet and local networks. This chapter introduces the concept of IDS
and motivation. Efficiency of Intrusion Detection Systems, Principle of
Data Mining, Types of Databases and field of applications are also
discussed. The objectives and scope of the thesis are also presented.
36
CHAPTER 2
LITERATURE REVIEW
2.1 Intrusion Detection System (IDS)
An Intrusion Detection System (IDS) is software and/or
hardware, which is designed for identifying the undesirable efforts for
enhancing the computer security systems [35]. Especially, the wireless
sensor devices has given rise to a wider range of amazing applications
in various walks of our life that involve environment and habitat
monitoring, healthcare applications and many more. But,
simultaneously, the sensor nodes have produced the same number of
threats caused by attackers, whose intention is to achieve access to
the network and the data transferred inside it. Till now, numerous
classical security methodologies exist for the purpose of avoiding these
intrusions [64].
IDS a concept originally introduced by Anderson [68] and later
formalized by Denning [37], have received increasing attention over the
past 20 years. IDSs are systems that aim at detecting intrusions, i.e.,
sets of actions that attempt to compromise the integrity, confidentiality
or availability of a computer resource [119].In short, computer security
deals with the protection of data and the computing resources and is
37
commonly associated with the following three properties (commonly
referred to as C.I.A. triad) [86]
2.1.1 Confidentiality
It is prevention of any intentional or unintentional unauthorized
disclosure of data. For example, an intruder learning about the
customer credit card database or getting access to the proprietary
source code is considered a breach of confidentiality. Note that
typically such a breach is irreversible and cannot be confined easily.
The term confidentiality can also be understood in a broader context in
which it also pertains to the non-delivery of services to unauthorized
users, even though this would not compromise confidentiality in itself.
2.1.2 Integrity
It is prevention of intentional or unintentional unauthorized
modification of data. For example, an intruder defacing the company’s
web server or modifying the bank’s database content for personal gain
is an attack against data integrity. Note that typically integrity can be
restored, e.g., from other sources such as backup copies, although this
process may be costly, time-consuming, and not always complete.
38
2.1.3 Availability
It is prevention of the unauthorized withholding of computing
resources. Examples of availability include the denial-of-service (DOS)
attack, in which the attacker blocks the computing resources so that
authorized users cannot use them, or physical equipment theft.
Based on this definition of the C.I.A triad, it can be defined
intrusion as follows:
Intrusion is any set of actions that attempt to compromise the
confidentiality, integrity or availability of a computer resource.
An intrusion detection system monitors computer systems and
networks to determine if a malicious event (i.e., an intrusion) has
occurred. Each time a malicious event is detected, the IDS raise an
alarm.
Typically, the requirements for confidentiality, integrity and
availability are not absolute, but are defined by a security policy.
The security policy states which information is confidential, who
is authorized to modify given information and what kind of use of
computer systems is acceptable. Therefore we can reformulate the
initial definition of intrusion as Intrusion is a violation of a security
policy.
39
One may categorize intrusion detection systems in terms of
behaviour i.e., they may be passive (those that simply generate alerts
and log network packets). They may also be active which means that
they detect and respond to attacks, attempt to patch software holes
before getting hacked or act proactively by logging out potential
intruders, or blocking services.
2.2 Classification of intrusion detection systems
Primarily, an IDS is concerned with the detection of hostile
actions. This network security tool uses either of two main techniques.
One category is for analyzing the network traffic and the other is to
analyze the operating system audit trails. These systems use either the
rule-based misuse detection or anomaly detection naturally [115] and
their power relies on the ability of the security personnel developing
them to a larger extent. The first category is capable of identifying the
known attack types alone. On the contrary, the second category is
subjected to the generation of false positive alarms. Therefore, several
machine learning techniques have been applied for designing IDS.
These machine learning techniques include neural networks, linear
genetic programming, Support vector machines, Bayesian Networks,
Multivariate adaptive regression splines and Fuzzy inference systems.
[19].Likewise, several data mining techniques has been developed as
40
well to detect the key features or parameters that help in defining
intrusions [140].
Figure 2.1 Intrusion Detection System Classifications and Processing
2.2.1 Intrusion Detection Approach
This network security tool uses either of two main techniques
(described in more detail below). The first one, anomaly detection,
explores issues in intrusion detection associated with deviations from
normal system or user behaviour. The second employs signature
41
detection to discriminate between anomaly or attack patterns
(signatures) and known intrusion detection signatures. Both methods
have their distinct advantages and are advantages as well as suitable
application areas of intrusion detection.
2.2.1.1 Anomaly-Based Detection
Anomaly-based detection is the process of comparing definitions
of what activity is considered normal against observed events to
identify significant deviations. An IDS using anomaly-based detection
has profiles that represent the normal behaviour of such things as
users, hosts, network connections, or applications. The profiles are
developed by monitoring the characteristics of typical activity over a
period of time. For example, a profile for a network might show that
Web activity comprises an average of 13% of network bandwidth at the
Internet border during typical working day hours.
The IDS uses statistical methods to compare the characteristics
of current activity to thresholds related to the profile, such as detecting
when Web activity comprises significantly more bandwidth than
expected and alerting an administrator of the anomaly. Profiles can be
developed for many behavioural attributes, such as the number of e-
mails sent by a user, the number of failed login attempts for a host, and
the level of processor usage for a host in a given period of time.
42
2.2.1.2 Signature-Based Detection
A signature is a pattern that corresponds to a known threat.
Signature based detection is the process of comparing signatures
against observed events to identify possible incidents. Signature-based
detection is very effective at detecting known threats but largely
ineffective at detecting previously unknown threats, threats disguised
by the use of evasion techniques, and many variants of known threats.
Signature-based detection is the simplest detection method because it
just compares the current unit of activity, such as a packet or a log
entry, to a list of signatures using string comparison operations.
Signature-based detection technologies have little understanding of
many network or application protocols and cannot track and
understand the state of complex communications.
2.2.2 Types of Protected Systems
There are many types of IDS technologies. They are divided into
the following three groups based on the type of events that they
monitor and the ways in which they are deployed:
2.2.2.1 Host Based Intrusion Detection
Host-Based System monitors the characteristics of a single host
and the events occurring within that host for suspicious activity.
43
Examples of the types of characteristics host-based IDS might monitor
are network traffic (only for that host), system logs, running processes,
application activity, file access and modification, and system and
application configuration changes. Host-based IDSs are most
commonly deployed on critical hosts such as publicly accessible
servers and servers containing sensitive information.
Host-based IDS places monitoring sensors also known as
agents on network resources nodes to monitor audit logs which are
generated by Network Operating System or application program. Audit
logs contain records for events and activities taking place at individual
Network resources. Because this Host-Based IDS can detect attacks
that cannot be seen by Network-based IDS such as Intrusion and can
be misuse by trusted insider. Host-based system utilize Signature rule
base which is derived from site-specific security policy. Host-Based
can overcome the problems associated with Network based IDS
immediately after alarming the security personnel who can locate the
source provided by site security policy. Host-based IDS can also verify
if any attack was unsuccessful, either because of immediate response
to alarm or any other reason but this is not available at packet level.
Host-Based IDS can also maintain user login and user logoff action
and all activity that generates audit records.
44
A Host based Intrusion Detection system has only host based
sensors and a network based Intrusion detection system has network-
based sensor [2]. Host-based technology examines events like what
files were accessed and what applications were executed [56].
Network-based intrusion detection is the problem of detecting
unauthorized use of computer systems over a network, such as the
Internet [33].
A good intrusion detection system should be able to distinguish
between normal and abnormal user activities [8]. This would include
any event, state, content, or behaviour that is considered to be
abnormal by a pre-defined standard [47]. Data mining-based intrusion
detection systems can be classified according to their detection
strategy. There are two main strategies such as misuse detection and
anomaly detection [111]. Misuse detection, which uses patterns of well-
known attacks or weak spots of the system to identify intrusions [145]
and anomaly detection, which tries to determine whether deviation
from the established normal usage patterns can be flagged as
intrusions [70,98]. One major challenge in intrusion detection is that we
have to identify the camouflaged intrusions from a huge amount of
normal communication activities [70].
45
In order to detect intrusion activities, many Machine Learning
(ML) algorithms, such as Neural Network [21], Support Vector Machine
[32], Genetic Algorithm [154], Fuzzy Logic [96], and Data Mining [88],
etc. have been widely used to the huge volume of complex and
dynamic dataset to detect known and unknown intrusions. It is very
important for IDSs to generate rules to distinguish normal behaviours
from abnormal behaviour by observing dataset, which is the record of
activities generated by the operating system that are logged to a file in
chronologically sorted order [33].
Hence, IDS should lower the quantity of data to be processed
and this is more vital in case of real-time detection. Data filtering, data
clustering and feature selection can achieve reduction of data.
Clustering can be done to obtain the hidden patterns in the data and
the essential features used for detection purposes. Better classification
is possible with feature selection, which searches for the subset of
features that excellently classifies the training data [134]. The classical
cluster analysis works by assigning each datum to exactly one cluster.
But, the fuzzy cluster analysis improves this requirement by using
gradual memberships. This helps in dealing with the data that
simultaneously belong to more than one cluster. The intrusion
detection systems (IDS) extensively use the Clustering methodologies
46
and in particular, the fuzzy approaches seem to be more efficient than
the other clustering algorithms in use. Fuzzy C-Means clustering model
(FCM)was initially introduced by Dunn in 1974 and it was extended and
generalized by Bezdek in 1983 [123].Generally, the techniques for
dimensionality reduction concentrate either on choosing a suitable
subset from the original set of I attributes or on mapping the initial I-
dimensional data onto the K-dimensional space, where K<I[136].
Most of the recent feature extraction techniques involve linear
transformations of the original pattern vectors to new vectors of lower
dimensionality [120].The renowned technique for reducing the
dimensionality is the Principal Component Analysis. But, problems
arise in this method with the selection of number of directions. It does
not perform the computation of principal component in high
dimensional feature spaces that have relation to input space by some
nonlinear map [127]. Linear Discriminate analysis feature reduction
technique is the new scheme employed in the field of cyber-attack
detection. This method reduces the number of input features in
addition to improving the classification accuracy. Moreover, the training
and the testing time of the classifiers can be decreased by this method
through the selection of most discriminating features [134].
47
The way the optimal set of features is selected forms the major
problem encountered by most of the researchers. This is because; all
the features are not related to the learning algorithm. In some
situations, irrelevant and redundant features can produce noisy data
that can distract the learning algorithm and degrade the detector
accuracy, leading to time consuming training and testing processes.
Feature selection was proved to have a considerable effect on the
performance of the classifiers [105]. A feed-forward neural network
classically trained using back-propagation can be regarded as an
effective classifier of the actions produced by the head of severely
disabled people [138], [74], and [109]. Yet, there are demerits with the
standard neural networks because it offers poor generalisation ability
when provided with limited training data. Bayesian techniques have
been applied to neural networks in the recent years for enhancing the
accuracy and robustness of neural network classifiers [133]. In our
previous research [79], it has been shown that the Bayesian neural
network is capable of classifying the head movement commands
consistently even with limited training data.
2.2.2.2Network Based Intrusion Detection
It monitors network traffic for particular network segments or
devices and analyzes the network and application protocol activity to
48
identify suspicious activity. It can identify many different types of events
of interest. It is most commonly deployed at a boundary between
networks, such as in proximity to border firewalls or routers, virtual
private network (VPN) servers, remote access servers, and wireless
networks.
Network based IDS are best suited for alert generation of
intrusion from outside the perimeter of the enterprise. The network
based IDS are inserted at various points on LAN and observe packets
traffic on the Network information is assembled into packets and
transmitted on LAN or Internet. Network based IDS are valuable if they
are placed just outside the firewalls, thereby alerting personals to
incoming packets that might circumvent to the firewall. Some Network-
Based IDS take or allows taking input of Custom signatures taken from
user security policy which permits limited detection security policy
violation. This limitation is due to packets traffic information that does
not work well today in switched and encrypted environments where
packets analysis is weak in detecting, attacking or originating from
authorized Network users. Network-Based Intrusion Detection Systems
(IDS) use raw network packets as the data source. The IDS typically
uses a network adapter in promiscuous mode that listens and analyses
all traffic in real-time as it travels across the network.
49
To detect newly encountered attacks, various researches have
been undertaken which use data mining as the key component [53].
Data mining is the analysis of data to establish relationships and
identify hidden patterns of data which otherwise would go unnoticed.
Many researchers have dwelled into the field of database intrusion
detection in databases using data mining [129].
Several data mining techniques have been applied for intrusion
detection, where, K-Mean Clustering [12] is unsupervised data mining
techniques for intrusion detection. K-Means is a popular partition
clustering algorithm for its simplicity in implementation, and it is
commonly applied in diverse applications. The main drawbacks of the
k-means algorithm are: the choice of the value of k, the cluster result is
sensitive to the selection of the initial cluster centroids and
convergence to the local minimum. In order to overcome the difficulties
of K-Means clustering, several authors put modifications on the K-
Means clustering. In [96], modification to K-Means clustering algorithm
has been proposed for intrusion detection. This modified K-Means
clustering algorithm is called as Y-Mean clustering that is extensively
used for detecting the intrusion behaviour.
On the other hand, many researchers have argued that Artificial
Neural Networks (ANNs) can improve the performance of intrusion
50
detection systems (IDS) when compared with traditional methods.
Artificial Neural Network (ANN) is one of the widely used techniques
and has been successful in solving many complex practical problems.
However, for ANN-based IDS, detection precision, especially for low-
frequent attacks, and detection stability are still needed to be
enhanced. Furthermore, some of the researchers utilized Self-
Organizing Map (SOM) or Self-Organizing Feature Map (SOFM) that is
a type of artificial neural network, trained using unsupervised learning
to produce a low-dimensional (typically two-dimensional), discretized
representation of the input space of the training samples, called a map.
Self-organizing maps are different from other artificial neural networks
in the sense that they use a neighbourhood function to preserve the
topological properties of the input space. By providing the better
detection accuracy, some of the researchers combined ANN with the
data mining approaches to solve the problem and help IDS achieve
higher detection rate, less false positive rate and stronger stability.
In recent times, intrusion detection has received a lot of interest
among the researchers because it is widely applied for preserving the
security within a network. Here, we present some of the techniques for
intrusion detection. G. Gowrisona et al. [53] designed an intrusion
detection system to classify the network behaviour with less
51
computational complexity of O (n). The KDD Cup99 is a bench mark
data used here to achieve promising classification rate. To achieve
high detection rate in Intrusion Detection System (IDS), Shingo Mabu
et al [129], described a fuzzy class association rule mining method
based on Genetic Network Programming (GNP). GNP is used to
enhance the representation ability with compact programs derived from
the reusability of nodes in a graph structure. The combined method is
evaluated with KDD99Cup and DARPA98 databases and showed that
it provides competitively high detection rates.
However, to overcome the network based anomalies detection
issue, LatifurKhanet al. [87] has proposed a method which was the
combination of SVM and DGSOT, which starts with an initial training
set and expanded it gradually using the clustering structure produced
by the DGSOT algorithm. They compared the proposed approach with
the Rocchio Bundling technique and random selection in terms of
accuracy loss and training time gain using a single benchmark real
data set. Due to the necessity of misuse and anomaly detection in a
single system, M. Bahrololumet al. [12] proposed an approach to
design the system using a hybrid of misuse and anomaly detection for
training of normal and attack packets respectively. The utilized method
for attack training was the combination of unsupervised and supervised
52
Neural Network (NN) for Intrusion Detection System. By misuse
approach known packets were identified fast and unknown attacks
were also be detected.
For the importance of an efficient Intrusion Detection System,
K.S. Anil Kumar and V. Nanda Mohan [10] proposed a combination of
three techniques comprising two machine-learning paradigms. K-
Means Clustering, Fuzzy Logics and Neural Network techniques were
deployed to configure an effective intrusion detection system. This
approach revealed the advantage of converging K-Means-Fuzzy-
Neural network techniques to eliminate the preventable interference of
human analyst in such occasions. Also, to improve the accuracy as
well as efficiency of the Intrusion Detection System, Shekhar R.
Gaddamet al.[128] presented "K-Means+ID3,” a method to cascade k-
Means clustering and the ID3 decision tree learning methods for
classifying anomalous and normal activities in a computer network, an
active electronic circuit, and a mechanical mass-beam system. Results
showed that the detection accuracy of the K-Means+ID3 method was
as high as 96.24 percentages at a false-positive-rate of 0.03
percentages on NAD; the total accuracy was as high as 80.01
percentages on MSD and 79.9 percentages on DED.
53
To overcome network security issues and to find better method
than SVM, M. Ektefaet al. [38] have presented intrusion detection using
data mining techniques such as classification tree and support vector
machines. Their result indicated C4.5 algorithm is better than SVM in
detecting network intrusions and false alarm rate in KDD CUP 99
dataset. Rasha G. Mohammed Helali [118] has presented a survey on
data mining based network Intrusion Detection System (IDS). They
presented the features of signature based NIDS in addition to the
current state-of-the-art of Data Mining based NIDS approaches.
Intruder was one of the most publicized threats to security. Network
Intrusion Detection Systems (NIDS) had become a standard
component in network security infrastructures. They provided general
guidance for open research areas and future directions. The intention
of their survey was to give the reader a broad overview of the work that
had been done at the intersection between intrusion detection and data
mining.
Anomaly Intrusion Detection System (IDS) have various
drawbacks like complex computation and inefficiency in real time
detection. So, in order to reduce the computational complexity, Zhiyuan
Tan et al [155] have designed a method called Linear Discriminant
Analysis (LDA). They have used the difference distance map for
54
selecting the significant features. Here, the high-dimensional feature
vectors were transformed into a low-dimensional domain by the
designed method initially. Then, based on the Euclidean distance on
the simple, low dimensional feature domain, they have identified the
similarity between the new incoming packets and a normal profile. The
experimental results were based on the pre-calculated threshold, which
differentiates normal and abnormal network packets. DARPA 1999 IDS
dataset was used here to evaluate their proposed method. But the
conventional Linear Discriminant Analysis (LDA) feature reduction
technique has drawbacks that were not suitable for non-linear dataset.
In general, the huge sized network traffic data used in intrusion
detection system have ineffective information that affects the system
accuracy. So in order to overcome this drawback, Shailendra Singh
and Sanjay Silakari [126] have designed an efficient feature reduction
method called Generalized Discriminant Analysis (GDA). The number
of input features was reduced by this method. Also, the classification
accuracy was increased and the time required for classifier in training
and testing was reduced by selecting the most discriminating features.
The performance of their designed method was evaluated by Artificial
Neural Network (ANN) and C4.5 classifiers. The experimental results
have shown that the accuracy of their designed method was improved.
55
The previously used k-means clustering algorithm in intrusion
detection system have various drawbacks such as computation
complexity and the selection of initial central point affects the
algorithmic results. So, Li Tian and WangJianwen [93] have designed
an improved k-means clustering algorithm, which introduced the
optimized dynamic central point cyclic method. The improved
clustering method applied in the intrusion detection system has
enhanced the fault detection rate of abnormal detection and has
reduced the false drop rate effectively as well. Finally, the algorithm
was evaluated by KDD cup 99 dataset to show that the accuracy of
data classification and the detection efficiency has increased
significantly. Also, the experimental results have revealed that the
designed algorithm has achieved the desired objectives with a higher
detection rate and higher efficiency.
The different issues found in the intrusion detection system were
regular updating, lower detection, capability to unknown attacks, non-
adapting high false alarms rate, high resources consumption and many
others. However, because of the importance of soft computing in the
intrusion detection system, Hafiz Muhammad Imran et al [55] have
introduced an efficient soft computing method to select the optimum
subset of features. Here, to get better results, they have provided a
56
hybrid method called LDA + GA for feature transformation and
selection. LDA was chosen here as a feature reduction method
because it outperformed PCA. Also, the dataset used here for training
and testing was standard NSL-KDD dataset. Further, to classify the
network traffic into normal or intrusive activities, they have used an
outstanding classification method called RBF. The experimental results
of our designed method have shown that the selection of optimal
subset of features has reduced the time consumption rate and
increased the accuracy ratio as well.
The existing intrusion detection system makes use of the entire
irrelevant features. Hence, to produce an effective and efficient
classification process, a well-defined feature extraction algorithm was
essential. Rupali Datti and Bhupendraverma [120] have suggested an
efficient feature extraction method called as Linear Discriminant
Analysis (LDA) for intrusion detection system. The back propagation
algorithm was employed to perform the classification process. This
method aims to identify the significant input features that are
computationally efficient and effective in constructing IDS. It is
apparent from their experimental results that the proposed model has
offered improved and robust representation of data. This is because, it
has achieved 97% of data reduction and about 94% of training time
57
reduction. In addition, the accuracy achieved in identifying the new
attacks is found to be more or less the same. The number of computer
resources as well as both the memory and the CPU time spent on
detecting an attack was also decreased. The experimental results have
shown that their method was reliable for detecting intrusion.
To deal with the multiclass problem in intrusion detection
system, Snehal A. Mulayet al [132] have designed a decision-tree-
based support vector machine that uses support vector machines and
decision tree in a combined fashion. The non-time consuming training
and testing processes may be viewed as the benefits of this method,
which in turn increases the system efficiency. At first, the dataset was
split into two subsets from root to the leaf until every subset contains
only one class. This had a larger impact on the classification
performance of their system. Though the final results for the designed
method was not presented, it can be known that the multiclass pattern
recognition problems can be solved using the tree structured binary
SVMs and the resultant intrusion detection system could be of more
speed than the other methods.
Shingo Mabuet al [129] has developed a GNP-based fuzzy
class-association-rule mining with sub attribute utilization and
classifiers that rely on the extracted rules. It is capable of consistently
58
utilizing and combining the discrete and continuous attributes in a rule
and can efficiently extract several superior rules for classification. As
an application, intrusion-detection classifiers for both misuse detection
and anomaly detection have been developed and their effectiveness
was proved using KDD99Cup and DARPA98 data. The experimental
results of misuse detection depict that the designed method offers high
DR and low PFR that serve as the two important criteria for security
systems.
Gang Wang et al, [49] have proposed an intrusion detection
method called as FC-ANN that depends on ANN and fuzzy clustering.
The fuzzy clustering technique was employed to partition the
heterogeneous training set in to numerous homogenous subsets. In
such a way, the complexity of each of the sub training set was
decreased and as a result, the detection performance was increased.
The experimental results using the KDD CUP
1999datasetdemonstrates the effectiveness of their method, in
particular, for low-frequent attacks like R2L and U2R attacks in terms
of detection precision and detection stability.
Detecting network intrusion has been not only important but also
difficult in the network security research area [20]. In Medical Sensor
Network (MSN), network intrusion is critical because the data delivered
59
through network is directly related to patients’ lives. Traditional
supervised learning techniques are not appropriate to detect
anomalous behaviors and new attacks because of temporal changes in
network intrusion patterns and characteristics in MSN. Therefore,
unsupervised learning techniques such as SOM (Self-Organizing Map)
are more appropriate for anomaly detection. This work proposed a
real-time intrusion detection system based on SOM that groups similar
data and visualize their clusters. The system labels the map produced
by SOM using correlations between features. The system with KDD
Cup 1999 data set because MSN data is not available yet. The system
yields the reasonable misclassification rates and takes 0.5 seconds to
decide whether a behavior is normal or attack.
The KDD Cup 99 dataset has been the point of attraction for
many researchers in the field of intrusion detection from the last
decade. Many researchers have contributed their efforts to analyze the
dataset by different techniques. Analysis can be used in any type of
industry that produces and consumes data, of course that includes
security. This paper is an analysis of 10% of KDD cup’99 training
dataset based on intrusion detection. This work focused on
establishing a relationship between the attack types and the protocol
used by the hackers, using clustered data. Analysis of data is
60
performed using k-means clustering; In this work used the Oracle 10g
data miner as a tool for the analysis of dataset and build 1000 clusters
to segment the 494,020 records. The investigation revealed many
interesting results about the protocols and attack types preferred by the
hackers for intruding the networks.
In this work, establish a different implementation level clustering
technique which provides a new dimension for classification of
datasets. The training set and the testing set of data are classified
according to the separate kind of algorithms discussed here. The
analysis of the performance will show a clear edge over the other
existing technique used for data classification. The future research
issues which need to be resolved and investigated further are given
with new trends and ideas.
2.2.2.3 Hybrid Based Intrusion Detection
It has been examined the different IDSs use different mechanisms to
signal or trigger alarms on your network. It is also examined two
locations that IDSs use to search for intrusive activity. Each of these
approaches has benefits and drawbacks. By combining multiple
techniques into a single hybrid system, however, it is possible to create
61
IDS that possesses the benefits of multiple approaches, while over
coming many of the drawbacks.
2.3 Structure of IDS
With respect to where and how data is processed by the
intrusion detection system, the intrusion detection systems can be
classified into distributed and centralized. A distributed intrusion
detection system (DIDS) is one where data is collected and analyzed in
multiple hosts, as opposed to a centralized intrusion detection system
(CIDS), in which data may be collected in a distributed fashion, but is
processed centrally. Both distributed and centralized intrusion
detection systems may use host- or network-based data collection
methods, or a combination of them.
2.3.1 Data Source
Intrusion detection systems can run on either a continuous or
periodic feed of information (Real-time IDS and Interval-based IDS
respectively) [7] and hence they use two different intrusion detection
approaches.
Audit trail analysis is the prevalent method used by periodically
operated systems. In contrast, the IDS deployable in real-time
62
environments are designed for online monitoring and analyzing system
events and user actions.
2.3.2 Behaviour of an attacker
Intrusion detection systems must be capable of distinguishing
between normal (not security-critical) and abnormal user activities, to
discover malicious attempts in time. However translating user
behaviours (or a complete user-system session) in a consistent
security-related decision is often not that simple - many behaviour
patterns are unpredictable and unclear (Fig. 2.2).
In order to classify actions, intrusion detection systems take
advantage of the anomaly detection approach, sometimes referred to
as behaviour based [Deb99] or attack signatures i.e. a descriptive
material on known abnormal behaviour (signature detection), also
called knowledge based.
63
Figure 2.2 Behaviour of the user in the system
One may categorize intrusion detection systems in terms of
behaviour i.e., they may be passive (those that simply generate alerts
and log network packets). They may also be active which means that
they detect and respond to attacks, attempt to patch software holes
before getting hacked or act proactively by logging out potential
intruders, or blocking services.
2.3.3 Analysis Timing
Intrusion detection systems can run on either a continuous or
periodic feed of information (Real-time IDS and Interval-based IDS
respectively) and hence they use two different intrusion detection
approaches. Audit trail analysis is the prevalent method used by
periodically operated systems. In contrast, the IDS deployable in real-
time environments are designed for online monitoring and analyzing
system events and user actions.
2.3.3.1 Audit Trail Processing
There are many issues related to audit trail (event log) [11]
processing. Storing audit trail reports in a single file must be avoided
64
since intruders may use this feature to make unwanted changes. It is
far better to keep a certain number of event log copies spread over the
network, though it would imply adding some overheads to both the
system and network.
Further, from the functionality point of view, recording every
event possible means a noticeable consumption of system resources
(both the local system and network involved). Log compression,
instead, would increase the system load. Specifying which events are
to be audited is difficult because certain types of attacks may pass
undetected.
It is also difficult to predict how large audit files can be – through
experience one can only make a rough estimate. Also, an appropriate
setting of a storage period for current audit files is not a straight
forward task. In general, this depends on a specific IDS solution and its
correlation engine. Certainly, archive files should be stored as copies
for retrieval analysis purposes.
It is also difficult to predict how large audit files can be – through
experience one can only make a rough estimate. Also, an appropriate
setting of a storage period for current audit files is not a straight
forward task.
65
In general, this depends on a specific IDS solution and its
correlation engine. Certainly, archive files should be stored as copies
for retrieval analysis purposes.
2.3.3.2 On-Fly Processing
With on the fly processing [14], IDS performs online verification
of system events. Generally, a stream of network packets is constantly
monitored constantly. With this type of processing, intrusion detection
uses the knowledge of current activities over the network to sense
possible attack attempts (it does not look for successful attacks in the
past).
Given the computation complexity, the algorithms that are used
here are limited to quick and efficient procedures that are often
algorithmically simple. This is due to a compromise between the main
requisite – attack detection capability and the complexity of data
processing mechanisms used in the detection itself.
At the same time, construction of an on-the-fly processing IDS
tool[32] requires a large amount of RAM (buffers) since no data
storage is used. Therefore, IDS may sometime miss packets, because
realistic processing of too many packets is not available.
66
The amount of data collected by the detector is small since it
views only buffer contents. Hence, only small portions of information
can be analyzed for searching certain values or sequences.
2.4 IDS Data Processing Techniques
Depending on the type of approach taken in intrusion detection,
various processing mechanisms (techniques) [36, 44] are employed for
data that is to reach IDS. Below, several systems are described briefly:
2.4.1 Expert systems
These work on a previously defined set of rules describing an
attack. All security related events incorporated in an audit trail are
translated in terms of if-then-else rules. Examples are Wisdom &
Sense and Computer Watch (developed at AT&T).
2.4.2 Signature analysis
Similarly to expert System approach, this method is based on
the attack knowledge. They transform the semantic description of an
attack into the appropriate audit trail format. Thus, attack signatures
can be found in logs or input data streams in a straightforward way. An
attack scenario can be described, for example, as a sequence of audit
events that a given attack generates or patterns of searchable data
67
that are captured in the audit trail. This method uses abstract
equivalents of audit trail data. Detection is accomplished by using
common text string matching mechanisms. Typically, it is a very
powerful technique and as such very often employed in commercial
systems (for example Stalker, Real Secure, Net Ranger, Emerald
eXpert-BSM).
2.4.3 Colored Petri Nets
The Colored Petri Nets [48] approach is often used to generalize
attacks from expert knowledge bases and to represent attacks
graphically. Purdue University’s IDIOT system uses Colored Petri Nets.
With this technique, it is easy for system administrators to add new
signatures to the system. However, matching a complex signature to
the audit trail data may be time-consuming. The technique is not used
in commercial systems.
2.4.4 State-Transition Analysis
An attack is described with a set of goals and transitions that
must be achieved by an intruder to compromise a system. Transitions
are represented on state-transition diagrams.
68
2.4.5 Statistical Analysis Approach
This is a frequently used method (for example SECURENET)
[99]. The user or system behaviour (set of attributes) is measured by a
number of variables over time. Examples of such variables are: user
login, logout, number of files accessed in a period of time, usage of
disk space, memory, CPU etc. The frequency of updating can vary
from a few minutes to, for example, one month. The system stores
mean values for each variable used for detecting exceeds that of a
predefined threshold. Yet, this simple approach was unable to match a
typical user behaviour model. Approaches that relied on matching
individual user profiles with aggregated group variables also failed to
be efficient. Therefore, a more sophisticated model of user behaviour
has been developed using short- and long-term user profiles. These
profiles are regularly updated to keep up with the changes in user
behaviours. Statistical methods are often used in implementations of
normal user behaviour profile-based Intrusion Detection Systems.
2.4.6 Neural Networks
Neural networks use their learning algorithms to learn about the
relationship between input and output vectors and to generalize them
to extract new input/output relationships. With the neural network
approach to intrusion detection, the main purpose is to learn the
69
behaviour of actors in the system (e.g., users, daemons).It is known
that statistical methods partially equate neural networks. The
advantage of using neural networks over statistics resides in having a
simple way to express nonlinear relationships between variables, and
in learning about relationships automatically. Experiments were carried
out with neural network prediction of user behaviours. From the results
it has been found that the behaviour of UNIX super-users (roots) is
predictable (because of very regular functioning of automatic system
processes) [73]. With few exceptions, behaviour of most other users is
also predictable. Neural networks are still a computationally intensive
technique, and are not widely used in the intrusion detection
community.
2.4.7 User Intention Identification
This technique models normal behaviour of users by the set of
high level tasks they have to perform on the system (in relation to the
Users’ functions). These tasks are taken as series of actions,
which in turn are matched to the appropriate audit data. The analyzer
keeps a set of tasks that are acceptable for each user. Whenever a
mismatch is encountered, an alarm is produced.
70
2.4.8 Computer Immunology
Analogies with immunology have lead to the development of a
technique that constructs a model of normal behaviour of UNIX
network services, rather than that of individual users. This model
consists of short sequences of system calls made by the processes.
Attacks that exploit flaws in the application code are very likely to take
unusual execution paths. First, a set of reference audit data is collected
which represents the appropriate behaviour of services, and then the
knowledge base is added with all the known “good” sequences of
system calls. These patterns are then used for continuous monitoring
of system calls to check whether the sequence generated is listed in
the knowledge base; if not an alarm is generated. This technique has a
potentially very low false alarm rate provided that the knowledge base
is fairly complete. Its drawback is the inability to detect errors in the
configuration of network services. Whenever an attacker uses
legitimate actions on the system to gain unauthorized access, no alarm
is generated.
2.5 Data mining Theoretical background
Data mining [71] is the process of automatically scanning huge
amount of data and searching available patterns in it. Storing large
amount of data is useful only when we extract useful information from
71
it. Data mining deals with large volume of data to extract meaningful
information. Data mining refers to extracting or mining knowledge from
large amounts of data [82]. In data mining, algorithms seek out
patterns and rules within the data from which sets of rules are derived.
Algorithms can automatically classify the data based on similarities
(rules and patterns) obtained between the training and the testing data
set.
Data mining [27] is the process of discovering patterns in data,
either automatically or semi-automatically. The patterns discovered
must be meaningful in that they lead to some advantage, usually
financial advantages. Data mining combines concepts, algorithms and
tools. It has derived concept from machine learning and statistics for
the analysis of very large datasets. Data mining gain insights,
understanding of data and provides actionable knowledge. Data mining
provides capability to predict the outcome of a future observation.
Other than predicting future observation, data mining is also useful for
summarizing the underlying relationship in data.
Data mining can mine data from different data storage like text
data, databases, data warehouse, transactional data, multimedia data,
stream, spatiotemporal, time-series, sequence, and web, multi-media,
graphs & social and information networks etc. The field of data mining
72
grew out of the limitations of current data analysis techniques in
handling challenges posed by these new types of datasets.
Today, data mining has grown so vast that they can be used in
many areas like financial analysis, customer management, and risk
management, predicting costs of corporate expense claims,
healthcare, insurance, process control in manufacturing and in other
fields. This thesis illustrates how data mining is also applicable in
computer security management.
Data mining analyzes data from different perspective and
summarizes it into useful information. It also analyzes data from many
different dimensions, and then it categorizes and summarizes the
relationships identified. Technically, data mining is the process of
finding correlations or patterns among various fields in large datasets.
The current developments in data mining contributed a wide variety of
algorithms, drawn from the fields of statistics, pattern recognition,
machine learning, and database which is useful for technology
adaptation and usage.
Data mining is able to predict important things in advance. That
technique that is used to perform these feats is called modelling.
Modelling is simply the act of building a model. A model is a set of
rules, examples or mathematical relationships. Model is built on data
73
from situations where the outcome is known and then this model is
applied to other situations where the outcome is not known. Modelling
techniques have been around for centuries, but techniques of huge
data storage, data communication capabilities and ability to process
complex data is recently developed, so modelling is applicable to new
areas.
As a simple example of building a data mining model [27],
consider the director of educational institute. He/she would like to focus
results and educational quality of his institute. Large amount of student
data is usually available at all the institutes. He knows a lot about his
students, but it is impossible to discern the common characteristics of
his students. From the existing database of students, which contains
information such as age, sex, academic history, continuous
assessment details, family background etc., he can use data mining
tools for discovering useful patterns such as relation between student’s
previous academic performance with entrance examination score,
continuous assessment data with their final examination results, or
predicting about failure cases, the placement package received by a
student, establishing association between two elective subjects
registered by a student in a semester, number of international students
admitting to the institute. Data mining will be very helpful for such
74
analysis of the large amount of data, which in turn will help for
academic performance improvements, planning, promotional activities
etc .Data mining [27] is primarily used today by companies to acquire
information about their customers .data mining also enables these
companies to determine relationships among "internal" factors such as
price, product positioning, or staff skills, and" external" factors such as
economic indicators, competition, and customer demographics.
2.5.1. Data mining and Knowledge discovery
Data Mining is a step in KDD [102] process which uses specific
algorithms for extracting patterns (models) from data. The term KDD
refers to the overall process of discovering useful knowledge from
data. The KDD process has other steps like data preparation, data
selection, data cleaning etc. At first, data is obtained from various data
sources, then data pre-processing like data cleaning and data
integration is applied. This creates data warehouse. From data
warehouse task relevant data is taken and data mining is applied on
this. Data mining applies pattern evaluation to extract knowledge.
Therefore, Data mining plays an essential role in the knowledge
discovery process.
The KDD process refers to the whole process of changing low
level data into high level knowledge which is automated or semi-
75
automated discovery of patterns and relationships in huge databases
and data mining is one of the core steps in the KDD process.
Knowledge discovery is the process of automatically generating
information formalized in a form ‘understandable’ to humans. To bridge
the gap of analysing large volume of data and extracting valuable
information and knowledge for decision making using new
computerization technologies, DM and KDD has emerged since recent
years.
According to U. Fayyad [143] KDD will continues to evolve, from
the intersection of research in various fields like artificial intelligence ,
databases, machine learning, pattern recognition, statistics, knowledge
acquisition for expert systems, data visualization, high-performance
computing, machine discovery, scientific discovery and information
retrieval. KDD software systems incorporate theories, algorithms, and
methods from all of these fields.
Although, the two terms KDD and DM are closely related, yet
they refer to slightly different two concepts. Data mining is only the
application of a specific algorithm based on the overall goal of the KDD
process. The knowledge discovery stage then extracts the knowledge
which must then be post processed to facilitate human
76
Understanding. Post-processing usually takes the form of
representing the discovered knowledge in a user friendly display.
Figure 2.3 KDD process model
Data mining can mine data from different data storage [71][76]
like text data, databases, data warehouse, transactional data,
multimedia data, stream, spatiotemporal, time-series, sequence, and
web, multi-media, graphs & social and information networks etc. The
field of data mining grew out of the limitations of current data analysis
Knowledge
Pattern evaluation
Data Mining
Data Selection
Data pre-processing
77
techniques in handling challenges posed by these new types of
datasets.
2.5.2. History of data mining.
The term "Data mining" was introduced in the 1990s, but data
mining is the progress of a field with a long history [17]. Data mining
roots are traced back along three family lines: statistics, artificial
intelligence [85], and machine learning [80] which is shown in Figure
2.4.
Figure 2.4 Data Mining and Associated Fields
Statistics is the foundation of many technologies on which data
mining is built, e.g. regression analysis, standard distribution, standard
deviation, standard variance, discriminate analysis, cluster analysis,
and confidence intervals. All of these are used to study data and data
relationships.
78
Artificial intelligence (AI), which is built upon heuristics as
contrasting to statistics, it tries to apply human-thought-like processing
to statistical problems. Certain AI concepts which were adopted by
some high-end commercial products, such as query optimization
modules for Relational Database Management Systems.
Machine learning (ML) [141] is the combination of statistics and
AI. It could be considered an evolution of AI, because it blends AI
heuristics with advanced statistical analysis. Machine learning attempts
to let computer programmes learn about the data they study, such that
programmes make different decisions based on the qualities of the
studied data, using statistics for fundamental concepts, and adding
more advanced AI heuristics and algorithms to achieve its goals.
Data mining is adaptation of machine learning techniques to
business applications. Data mining is best described as the union of
historical and recent developments in statistics, AI, and ML. These
techniques are then used together to study data and find patterns,
rules and hidden trends .In preliminary days, data mining algorithms
mainly developed for numerical data but it further extended for all types
of data like text, web, picture, multimedia spatial etc. as data mining
began with analysis of single data base, but data mining techniques
have evolved for flat files, traditional and relational databases and data
79
warehouse. Later on, with the confluence of Statistics and Machine
Learning techniques, various algorithms evolved to mine structured
and unstructured data.
The field of data mining [147] has been greatly influenced by the
development of fourth generation programming languages and various
related computing techniques. In early days of data mining, most of the
algorithms employed only statistical techniques. Later on, they evolved
with various computing techniques like AI, ML and Pattern
Reorganization. Various data mining techniques (Induction,
Compression and Approximation) and algorithms developed to mine
the large volumes of heterogeneous data stored in the data
warehouses. The field of data mining has been growing due to its
enormous success in terms of scientific progress and broad-ranging
application achievements and, understanding. Various data mining
applications have been successfully implemented in various domains
like financial analysis, customer management, health care, retail,
telecommunication, fraud detection and risk analysis etc. The ever
increasing complexities in various fields and improvements in
technology have posed new challenges to data mining; the various
challenges include different data formats, data from disparate
80
locations, advances in computation and networking resources,
research and scientific fields, ever growing business challenges etc.
2.5.3. Data mining functionality
Data mining is extraction of interesting patterns or knowledge
from huge amount of data. For extraction of patterns various
functionalities are available. Data mining searches for non-trivial and
implicit patterns from data. These patterns are mostly previously
unknown but potentially useful. Data mining offers various types of
functionalities, specific functionality is selected depending on the
application area and kind of knowledge to be mined. Using these
functionalities different type of knowledge can be mined like
association rule, classification rule, discriminate rule and deviation
analysis etc. Data mining functionalities [104] are extensive and rich; it
can serve various fields and applications.
Figure 2.5 shows basic functionalities like classification,
clustering, frequent pattern mining, outlier analysis etc. these
functionalities are explained below.
81
Figure 2.5 Data mining functionalities
• Characterization and Discrimination
Data characterization [147] is a summarization of the general
characteristics or features of a target class of data. In data
characterization, based on user’s specific requirement summarization
is done. The data is usually collected by a query. In data discrimination
the target class data objects is compared with the objects from one or
multiple contrasting classes with respect to specified generalized
features[31][39].
82
• Mining frequent patterns
Frequent patterns [80] are the patterns that occur frequently in
the data. Patterns can include item sets, sequences and sub
sequences. A frequent item set refers to a set of items that often
appear together in a transactional data set.
Given a collection of items and a set of records, each of which
contain some number of items from the given collection, an association
function is an operation against this set of records which return,
affinities or patterns that exist among the collection of items. These
patterns can be expressed by rules such as "80% of all the records that
contain items A, B and C also contain items D and E." The specific
percentage of occurrences (in this case 80) is called the confidence
factor of the rule .Also, in this rule, A,B and C are said to be on an
opposite side of the rule to D and E. Associations can involve any
number of items on either side of the rule.
• Classification and prediction
Classification [71] techniques in data mining are capable of processing
a large amount of data. Classification assigns items in a data set to
target categories or classes. Classification correctly predicts the target
class for each case in the data.
83
Classification consists of assigning a class label to a set of
unclassified cases. Because the class label of each training tupple is
provided, this step is also known as supervised learning also.
Classification techniques infer a model from the database. The
database contains many attributes that denote the class of a tupple
and these are known as predicted attributes whereas the remaining
attributes are called predicting attributes. A combination of values for
the predicted attributes defines a class.
When learning classification rules, the system has to find the
rules that predict the class from the predicting attributes, so firstly the
user has to define conditions for each class; the data mine system then
constructs descriptions for the classes. Basically, the system should
give a case or tupple with certain known attribute values be able to
predict what class this case belongs to.
Once classes are defined the system should infer rules that
govern the classification therefore the system should be able to find the
description of each class. The descriptions should only refer to the
predicting attributes of the training set so that the positive examples
should satisfy the description and none of the negative. A rule said to
be correct, if its description covers all the positive examples none of
the negative examples of a class.
84
There are various data mining classification techniques like
Decision Tree based Methods, Rule-based methods, Naïve Bays and
Bayesian Belief Networks, Nearest Neighbour Method Neural Network,
Support Vector Machines [61], Ensemble Methods usable for
classification and prediction. Figure 2.6 shows classification using
decision tree.
Figure 2.6 Classification using decision tree
• Clustering
Clustering [117] and segmentation are the processes of creating
a partition so that all the members of each set of the partition are
similar according to some metric. Clustering method belongs to
unsupervised technique. In unsupervised technique classes or
categories are not predefined. In this a set of objects grouped together
85
because of their similarity or proximity. When learning is unsupervised,
the system has to discover its own classes i.e. the system clusters the
data in the database. The system has to discover subsets of related
objects in the training set and then it has to find descriptions that
describe each of these subsets.
Objects are often decomposed into an exhaustive and/or
mutually exclusive set of clusters. Clustering [71] according to similarity
is a very powerful technique, the key to it being to translate some
intuitive measure of similarity into a quantitative measure. There are a
number of approaches for forming clusters. One approach is to form
rules which dictate membership in the same group based on the level
of similarity between members. Another approach is to build set
functions that measure some property of partitions as functions of
some parameter of the partition. Figure 2.7 shows clustering data
mining functionality.
Figure 2.7 Clustering
86
• Outlier analysis
Outliers [71] are data objects that do not comply with the general
behaviour or model of data. Outliers (if present in dataset) are
discarded before processing through other data mining functionalities.
outliers usually represents exceptions or noise. Figure 2.8 shows
outlier analysis, R represent data which is outlier from rest of data.
Figure 2.8Outlier Analysis.
Data mining functionalities covers wide range of applications
however there is need of new functionalities. Data mining research can
provide new functionalities which can serve many application areas
efficiently. Research in data mining has multiple aspects, if handled
properly works effectively.
87
2.6 Evaluation of Datasets
Most intrusion detection techniques beyond basic pattern
matching require sets of data to train on. When work on advanced
network intrusion detection systems began in earnest in the late
1990’s, researchers quickly recognized the need for standardized
datasets to perform this training.
Such datasets allow different systems to be quantitatively
compared. Further, they provide a welcome alternative to the prior
method of dataset creation, which involved every researcher collecting
data from a live network and using human analysts to thoroughly
analyze and label the data. The first such widely cited dataset was for
the Information Exploration Shootout (IES), which unfortunately, is no
longer available. It was used to test the anomaly-detection
performance of those systems. It consisted of four collections of tcp
dump data: one that contained purely normal data, and three
consisting of normal data with injected attacks.
The data was apparently captured from a real network, and
consists of only the packet headers in order to protect the privacy of
the user’s .In one of the early papers from Lee and Stolfo [92], they
noted the anticipated arrival of a new dataset from the Air Force’s
Research Laboratory (AFRL) in Rome, NY.
88
The AFRL, along with MIT’s Lincoln Lab, collected network traffic
from their network and used it as the basis for a simulated network
.Using a simulated network allowed them to carefully control if and
when attacks were injected into the dataset. Furthermore, it allowed
them to collect the entire packet without needing to protect user
privacy. Details on the simulated networks and injected attacks are
available in Kendall [84]. They used the simulated network to create a
couple weeks of intrusion-free data, followed by a few weeks of data
labelled with intrusions. This data was made available to researcher’s
in1998 as the DARPA Off-line Intrusion Detection Evaluation.
Participants were then given two weeks of unlabelled data, including
previously unseen attacks, and asked to label the attacks. Lippmann
presented the results in [95].
Numerous researchers have used this data to test their systems,
both as part of the DARPA evaluation, as well as independently. In
response to the 1998 challenge, McHugh wrote a rather scathing
critique of the evaluation. While he presents many good points on how
an evaluation of IDSs should be performed, he also criticizes
numerous shortcomings in the challenge without acknowledging how
difficult addressing some of the issues is.
89
For example, he notes that the generated data was not validated
against real traffic to ensure that it had the same rates of malicious
traffic versus non-malicious anomalous traffic that caused false
positives. Doing so would, however, require more insight into real
traffic than we can possibly obtain (in particular, intent), further,
modelling of traffic at that scale is still an area with much research left
to be done.
Some of his more directly applicable feedback was used for the
IDS challenge the following year. In particular, Das [30] outlines the
improvements that were made in the test bed and injected attacks, and
provides details on the addition of Windows NT hosts and attacks in
the1999 evaluation.
While McHugh’s critique was based primarily on the procedures
used to generate the DARPA data, Mahoney [25] provides a critique
based on an analysis of the data compared to real world data captured
on their network. They note that many of the attributes that are well
behaved in the DARPA dataset are not in real world data. They found
that by mixing their real-world data with the DARPA data, they were
able to increase the number of legitimate detections (detections that
were not an artefact of the data generation process), using five simple
statistically based anomaly detectors.
90
While this approach is an excellent stop-gap measure to achieve
a more realistic performance measure using the DARPA data, it is not
suitable for all research for two reasons:
i. It requires the addition of attack-free (or at least accurately
labeled) real world data, which no one is willing to share to use
as a standard,
ii. It requires that the method not differentiate between the DARPA
data and the real-world data, which might be controllable for
some methods (particularly those that produce human readable
rules), but not for others (such as artificial neural networks and
hidden Markov models).
To address the first point, Mahoney [30] analyzed their real world
data with Snort, however they don’t address the possibility of the data
containing new or stealthy attacks that Snort is incapable of
detecting(and which drive the development of more advanced intrusion
detection techniques).
Lee did a great deal of analysis using the DARPA data, and
identified 41 features of interest to a data mining based network IDS.
He provided a copy of the DARPA data that was already pre-
processed, by extracting these 41 features, for the 1999 KDD Cup
contest, held at the Fifth ACM International Conference on Knowledge
91
Discovery and Data Mining. Since this version of the dataset already
has the tedious and time-consuming pre-processing step done, it has
been used as the basis for most of the recent research on data mining
IDSs.
There are a couple of other datasets that are used occasionally.
The first is the Internet Traffic Archive from Lawrence Berkeley
National Laboratory [67].
It consists of a collection of tcpdump data captures from a live
network on the Internet. It has been used by [92], primarily to show that
data mining methods are sensitive to traffic patterns, such as the
difference in traffic between working hours and overnight. Another
dataset is Security Suite 16, which was created by InfoWorld to test
commercial network intrusion detection systems [101].
As an alternative to using datasets, such as those described
above, skin [42] presents an intrusion detection approach that does not
require training data. Rather, it separates the normal data and the
noisy data (anomalies) into two separate sets using a mixture model.
This model can then be applied for anomaly detection. The technique
can also be applied to a dataset that has been manually labeled, in
order to detect marking errors [42].
92
Of all the datasets presented here, the DARPA/KDD dataset
appears to be the most useful as a dataset that can be used without
any further processing. Unfortunately, given the criticisms against this
data, we recommend that any further research in this area use both the
DARPA datasets and one of the DARPA datasets mixed with real-
world data. Doing so and being able to compare and contrast the
results should help alleviate most of the criticism against work based
solely on the DARPA data, and still allow work to be directly compared.
Ideally, someone using a mixed dataset will make their real-world data
available for everyone to use. This approach will necessitate the
regeneration of connection records as the KDD Cup data only
processed the 1998 DARPA data and obviously doesn’t include any
new data that may be mixed in. Finally, a couple observations on
dataset utilization:
First, the typical approach to using datasets is to have some
normal (intrusion-free) data and or data with labeled intrusions, which
is used to train the data mining methods being applied. None of the
literature, however, explicitly discusses the use of separate training
sets for classifiers and the classifiers they incorporate. It would
probably be useful to train the classifier using attacks the classifiers
have not already seen, such that the classifier can give proper weight
93
to classifiers that do a good job of detecting previously unseen
techniques.
Second, we have noticed a disturbing trend in some published
research to modify a standard dataset because the researchers do not
believe it accurately models real Internet traffic, for instance they
believe that it has too many or too few attacks. This is unfortunate as it
precludes a qualitative comparison of their research to other work.
Further, as a community we lack any solid statistics on traffic
characteristics in different environments, hence the use of modified
data implies that the given technique isn’t robust enough to perform
well on different or dynamic networks.
2.7 Feature Selection
The most popular data format to do analysis on is the connection
log. Besides log formats (such as packet logs), the connection record
format affords more power in the data analysis step, as it provides
multiple fields that correlation can be done on (unlike a format such as
command histories).
Additionally, not examining data stream contents saves
significant amounts of processing time and storage, and avoids privacy
issues. While some have argued that not looking at the data stream will
prevent the detection of user to root (U2R) attacks, and that some of
94
these attacks will be detected as attackers will modify the network
stream precisely to avoid IDS detection, as described in [117]. Lee and
Neri [92] found that converting the network data to connection logs
aided performance with their data mining techniques.
In the event that connection logs are built based on packet
information, certain features, such as the state of the connection
establishment and tear down, over-lapping fragments, and resend rate
will need to be calculated [92].
Connection records provide numerous features that are intrinsic
to each connection. Lee noted [60] that the timestamp, source address
and port, destination address and port, and protocol uniquely identify a
connection, making them essential attributes.
They go on to note that “association rules should describe
patterns related to the essential attributes.” Specifically, at least one of
those attributes must be present in the antecedent of a rule in order for
that rule to be useful. They call this the axis attribute for the rule. For
example, a rule that is based solely on the number of bytes transferred
really does not convey any useful information.
Likewise, if the value of some feature must be kept constant
through the processing of a set of records (for instance, the destination
95
host), that feature is called a reference attribute [92]. Other researchers
also had success with this approach. Dickerson [60] found that their
best results were achieved when they limited their rules to only use a
key consisting of the source IP, destination IP, and the destination port.
Hofmeyr and Forrest [13] used the same approach, although
they assign all connections with unassigned privileged ports to one
service group, and all connections with unassigned non-privileged
ports to another group.
Essential attributes provide vital information about connections,
most research uses some of the secondary attributes, such as
connection duration, TCP flags and the volume of data passed in each
direction. Some researchers, such as Dickerson and Dickerson, also
treat the essential attributes that they do not key of, such as timestamp
and source port as secondary attributes. Perhaps the most interesting
data point in the work of Singh and Kandula [95], who did not report the
use of any essential attributes, despite their work being based heavily
on that of Lee [144]. This may account for their poor performance,
while greater care in choosing connection features.
Unfortunately, the intrinsic attributes of a connection are
insufficient to provide adequate detector performance against most
attacks. Temporal information with each data point significantly
96
increased accuracy. Temporal information is captured in the form of
calculated attributes. A calculated attribute provides the average value
of an attribute, or the count or percentage of connections fulfilling some
criteria over the last w seconds, or n connections.
For example, Lee [92] included a count of how many packets in
the last w seconds had the same value for an attribute as the current
connection to the same service or destination host. It is formalized the
notion of defining calculated features as functions of the other features,
using a predefined set of operators such as count, percentage, or
average, as well as a set of data constraints such as same host, same
service, different host, or time window.
Honig [34] extended this approach by allowing the analyst to
dynamically create new features using these functions. A new column
is automatically added to the table to store the new feature in the
database.
In Lee [92] explained that the decision to count the occurrence of
a given attribute’s value is made when many frequent episode rules
are generated that include the given feature with a constant value.
Likewise, they generate an average value for an attribute if that
attribute is seen repeated in many frequent episode rules with different
values.
97
There are numerous techniques to identify which of the
secondary or calculated attributes provide the best feature set for a
given method. Frank [108] used backwards sequential search, beam
search, and random generation plus sequential selection.
Lee and Xiang [92] do an excellent job of applying information
theoretic measurement techniques to feature sets in order to evaluate
the relative utility of different sets (based on some earlier work by Lee.
The measures they use are entropy, conditional entropy, relative
conditional entropy, information gain, and information cost. The
concept of time is particularly problematic for IDSs to handle, both in
terms of correlating events over time, and behaviour that changes over
time.
The ability to correlate events over time is useful, particularly for
identifying regular activity, such as an automated process that transfers
Files at a specific time every night. Such behaviour may either be
expected or safely ignored, or it may indicate activity worth
investigation as it maybe from a Trojan horse or other form of malicious
code.
To address this problem, Li [22] developed the notion of a
calendar schema. These calendar schemas build temporal profiles,
which allow the mined induction rules to use multiple time granularities.
98
The other problem that time presents is that the behaviour of monitored
networks will change over time. Because of this, the profiles that
characterize the network will need to incorporate new behaviour and
age out old behaviour.
The manner in which this is accomplished is necessarily tied to
the underlying data model. For instance, in IDES and NIDES they
accomplished this by periodically updating their statistical models by
multiplying in an exponential decay factor when adding in the currently
observed values for an attribute [69].
For the inductive rules used by Lee [22], a new rule set was
created for each day’s data, and then merged with the existing rule set.
By keeping track of how often a rule appeared in a daily rule set, and
when a rule last appeared, they could ascertain the relevance of a rule
and age out old rules. Some of the techniques described below,
particularly those that rely on a mapping between the network
connection records and a geometric space (hyper-plane), only produce
optimal, or even usable, results if the features in the records are first
normalized. This is typically accomplished by scaling continuous
values to a given range, possibly scaling the values with a logarithmic
scale to avoid having large values(typically seen in distributions of
attributes of long-tailed network data)dominate smaller values [152].
99
Discrete vales are typically mapped to their own features,
coordinates that are equidistant from one another or represented
based on their frequency [23]. A similar problem is presented by zeros
in the dataset, as features with an observed value of zero may either
actually be zero, or they may be zero due to a lack of observations. To
address that problem Barbar [13] applied pseudo Bayes estimators to
refine the zero values in their training data.
Chan [23] address the same problem in association rules by
using a probability of novel events based on the frequency of rules
supporting the antecedent in the training set. They also looked at
Laplace smoothing, however found it was inappropriate as it required
the alphabet sizes and distributions to be known at training time.
Another technique that can be applied to the dataset to improve
accuracy is compression. Neri [13] found that compressing features, by
representing many discrete values with a single value is, “a valuable
way of increasing classification performances without introducing
complex features that may involve additional processing overhead.”
Barbar´a [13] applied feature compression by grouping together
connections that come from the same domain (subnet) in order to
detect activity coming from a highly coordinated group of hosts.
The information-theoretic work by Lee and Xiang [92] explains
that substituting a single record to represent a group of records (such
100
as all those in the past w seconds for a given service), significantly
increases the information gain (which should subsequently improve the
accuracy of detection methods on that data).Singh and Kandula [131]
note that the features they chose were based purely on heuristics and
that, “It would be really useful if the choice of these features could be
automated.”
Helmer [60] did exactly that with system call data using the “bag
of words” technique, where every call was represented by a bit in a
vector labelled as normal or intrusive. They then fed these vectors to a
genetic algorithm and found that the set of necessary features was
about half of the full set of available features. Using the pruned set
resulted incomparable detection accuracy and reduced the false
positive rate to zero.
2.8 Summary
In this chapter Literature reviews based on previous works are
discussed. Classification of intrusion detection systems, Types of
Protected Systems, IDS Data Processing Techniques, Data mining and
Knowledge discovery, Evaluation of Datasets and Feature Selection
are also discussed. Advantages and limitations for the previous works
were also being discussed.
101
CHAPTER 3
METHODOLOGY & DATABASE
3.1 The DARPA Intrusion-Detection Evaluation Program
The number of intrusions is to be found in any computer and
network audit data are plentiful as well as ever-changing. They are also
thoroughly scattered and attempts to structure or catalogue audit data
are extremely effort-intensive. In order to create effective detection
models, model-building algorithms typically require a large amount of
labelled data. One major difficulty in deploying IDS is the need to label
system audit data for the algorithms. Misuse-detection systems need
the data to be accurately labelled as either ‘normal’ or ‘attack’, whereas
for anomaly-detection systems, the data must be verified to ensure that
it is exclusively ‘normal’ namely attack-free. This requires the same
effort (40, 90) and preparation of the data in this manner is both time-
consuming and costly.
A generous sponsor for the production of intrusion-detection
audit data was found in the US government agency DARPA (Defence
Advanced Research Project Agency, US) an innovator and promoter of
technology, this organization has funded many projects in the last few
decades. In 1969, one such research and development project was
102
sub sidized ‘to create an experimental packet-switched network’. This
one venture saw the modest beginnings of what grew into the
omnipresent Internet, known today. As a matter of fact, DARPA
supports the evaluation of developing technologies: focusing on an
effort, documenting existing capabilities and guiding research.
The 1998 DARPA Off-line Intrusion-Detection Evaluation
Program [94, 103, and 75] was one such project. Aware of the lack of
suitable audit data sets for intrusion detection, DARPA sets out (1) to
generate an intrusion-detection evaluation corpus which could be
shared by many researchers, (2) to evaluate many intrusion-detection
systems, (3) to include a wide variety of attacks and (4) to measure
both attack-detection rates and false-alarm rates for realistic normal
traffic. To avoid publicizing confidential information concerning any real
network in connection with the data and in order not to cause
disruption in the operation of an on-line network, an extensive test bed
has been set up at MIT’s Lincoln Laboratories for synthesis purpose.
This test bed simulated the operation of a typical US Air Force LAN for
over two months allowing considerable amount of audit data to be
collected from it.
103
3.2 Attack Types in the 1999 DARPA Data Set Each attack type falls into one of the four following main categories:
• Denial-of-service (DOS)
DOS attacks have the goal of limiting or denying service(s)
provided to a user, computer or network. A common tactic is to
severely overload the targeted system like a SYN flood.
• Probing or surveillance
Probing or surveillance attacks have the goal of gaining
knowledge of the existence or configuration of a computer system or
network. Port scans or sweeping of a given IP address range is
typically used in this category like IPs weep.
• Remote-to-Local (R2L)
R2L attacks have the goal of gaining local access to a computer
or network to which the attacker only previously had remote access.
Examples of this are attempts to gain control of a user account say the
Dictionary.
• User-to-Root (U2R)
U2R attacks have the goal of gaining root or super-user access
on a particular computer or system with which the attacker previously
104
had user level access. These are the attempts by a non-privileged user
to gain administrative privileges (e.g. Eject). A total of 24 attack types
was included in the training data and further 14 novel attacks were
added to the test data, to compare the performance of IDS on ‘known’
and on ‘yet-unseen’ attacks .A further aim of the evaluation was to
determine whether systems could detect stealthy attacks. These are
variations of an attack. They have been modified from the standard
form available on the Internet, in an attempt to evade detection.
Methods of being stealthy vary, depending on the attack type [84]. The
attacks are grouped according to a category and type. The number of
occurrences is detailed; distinguishing between attacks launched in the
clear or performed stealthily. Furthermore, specifying whether it is
appeared in training or test data. For example, there were 46 Eject
attacks in the simulation. Of these, 10 were stealthy and 36 were
performed in the clear. Of those in the clear category, 29 figured in the
training data and 7 in the test data. In the DARPA programmes,
detection rates for each attack category were estimated for
comparative purposes, when evaluating the performance of IDS.
3.2.1 Different Attack Types
The category of an attack is determined by its ultimate goal, so
that within a given category, attacks may closely resemble each other.
105
The DOS attacks are designed to disrupt a host or network service.
Some DOS attacks (e.g. smurf) excessively load a legitimate network
service; others (e.g.teardrop, Ping of Death) create malformed packets,
which are incorrectly handled by the victim machine. Others still (e.g.
apache2, back, syslogd) take advantage of software bugs in network
daemon programmes. Probe attacks are launched by programmes,
which can automatically scan a network of computers to gather
information or find known vulnerabilities. Such probes are often
precursors to more dangerous attacks because they provide mapping
to machines and services and pinpoint weak links in a network. Some
of these scanning tools, satans, saint and mscan enable even an
unskilled attacker to check hundreds of machines on a network for
known vulnerabilities.
In the R2L attacks, an attacker who does not have an account
on a victim machine sends packets to that machine and gains local
access. Some R2L attacks exploit buffer overflows in network server
software (e.g. imap, named, sendmail); others exploit weak or
misconfigured security policies (e.g. dictionary, ftp-write, and guest)
and one (xsnoop) is a Trojan password capture programme. The
snmp-get R2L attack against the router is a password-guessing attack
where the community password of the router is guessed and an
106
attacker then uses SNMP to monitor the router. During U2R attacks, a
local user on a machine tries to obtain privileges normally reserved for
the UNIX root or super-user. Some U2R attacks exploit poorly-written
system programmes which run at root level and are susceptible to
buffer overflows (e.g. eject, ffbconfig, fdformat). Others may exploit
weaknesses in path-name verification (e.g. loadmodule), bugs in some
versions of perl (e.g. suidperl) or other software weaknesses.
3.2.2 Attack Descriptions
back - Denial-of-service attack against apache webserver,
where a client requests a URL containing many
backslashes.
dict - Guess passwords for a valid user, using simple
variants of the account name over a telnet connection.
eject - Buffer overflow using eject program on Solaris. Leads
to a user to-root transition if successful.
ffb - Buffer overflow using the ffbconfig UNIX system
command leads to root shell.
format - Buffer overflow using the fdformat UNIX system
command leads to root shell.
107
ftp - Write - Remote FTP user creates .rhost file in world
writable anonymous FTP directory and obtains local
login.
guest - Try to guess password via telnet for guest account.
ipsweep - Surveillance sweep performing either a port sweep or
ping on multiple host addresses.
land - Denial of service where a remote host is sent a UDP
packet with the same source and destination.
loadmodule - Non-stealthy load module attack which resets IFS for a
normal user and creates a root shell.
multihop - Multi-day scenario in which a user first breaks into one
machine.
neptune - Syn-flood denial-of-service on one or more ports.
nmap - Network mapping using the nmap tool. Mode of
exploring network will vary-options include SYN.
perlmagic - Perl attack which sets the user id to root in a perl script
and creates a root shell.
phf - Exploitable CGI script which allows a client to execute
arbitrary commands on a machine with a misconfigured
web server.
pod - Denial-of-service ping-of-death.
108
portsweep - Surveillance sweep through many ports to determine
which services are supported on a single host.
rootkit - Multi-day scenario where a user installs one or more
components of a rootkit.
satan - Network probing tool which looks for well-known
weaknesses. operates at three different levels. Level 0
is light.
smurf - Denial-of-service icmp-echo reply flood.
spy - Multi-day scenario in which a user breaks into a
machine with the purpose of finding important
information where the user tries to avoid detection.
Uses several different exploit methods to gain access.
syslog - Denial of service for the syslog service connects to port
514 with unresolvable source ip.
teardrop - Denial of service where mis-fragmented UDP packets
cause some systems to reboot.
warez - User logs into anonymous FTP site and creates a
hidden directory.
warezclient - Users downloading illegal software which was
previously posted via anonymous FTP by the warez
master.
109
warezmaster- Anonymous FTP uploads of Warez (usually illegal
copies of copyrighted software) onto FTP server.
3.3Data-Set Description
This is the data set used for The Third International Knowledge
Discovery and Data Mining Tools Competition, which was held in
conjunction with KDD-99 the Fifth International Conference on
Knowledge Discovery and Data Mining. The competition task was to
build a network intrusion detector, a predictive model capable of
distinguishing between ``bad'' connections, called intrusions or attacks,
and ``good'' normal connections. This database contains a standard
set of data to be audited, which includes a wide variety of intrusions
simulated in a military network environment.
The ‘KDDCUP99 Data’ [66] are the data sets, which were issued
for use in the KDDCUP ’99 Classifier-Learning Competition. These
sets of training and test data were made available [137, 91] and
consisted of a pre-processed version of the 1998 DARPA Evaluation
Data. This team’s IDS had performed particularly well in the Intrusion-
Detection Evaluation Program of that year, using data mining even as
a ‘pre-processing’ stage to extract characteristic intrusion features from
raw TCP/IP audit data. The original raw training data were about four
110
gigabytes of compressed binary tcpdump data obtained from the first
seven weeks of network traffic at MIT. This was pre-processed with the
feature-construction framework MADAM ID (Mining Audit data for
automated models for Intrusion Detection) to produce about five-million
connection records. A connection is defined to be a sequence of TCP
packets starting and ending at some well-defined times, between which
data flow to and fro from a source IP address to a destination IP
address, under some well-defined protocol. Each connection is labelled
as either ‘normal’ or with the name of its specific attack type. A
connection record consists of about 100 bytes. Ten percent of the
complementary two-weeks of the test data were, likewise, pre-
processed to yield a further less than half-a million connection records.
For the information of contestants, it was stressed that these test data
were not from the same probability distribution as the training data, and
that they included specific attack types which are not found in the
training data. The full amount of labelled test data with some two
million records was not included in this data set.
3.3.1Set of Features used in the Connection Records
In the KDDCUP99 Data, the initial features extracted for a
connection record [41, 89] include the basic features of an individual
TCP connection, such as: its duration, protocol type, number of bytes
111
transferred and the flag indicating the normal or error status of the
connection. These ‘intrinsic’ features provide information for general
network-traffic analysis purposes. Since most DOS and Probe attacks
involve sending a lot of connections to the same host(s) at the same
time, they can have frequent sequential patterns, which are different to
the normal traffic. For these patterns, a “same host” feature examines
all other connections in the previous 2 seconds, which had the same
destination as the current connection. Similarly, a “same service”
feature examines all other connections in the previous 2 seconds,
which had the same service as the current connection. These temporal
and statistical characteristics are referred to as the “time based” traffic
features. There are several Probe attacks which use a much longer
interval than 2 seconds (for example, one minute) when scanning the
hosts or ports. For these, a mirror set of “host-based” traffic features
were constructed based on a ‘connection window’ of 100 connections:
The R2L and U2R attacks are embedded in the data portions of the
TCP packets and it may involve only a single connection. To detect
these, ‘connection’ features individual connections were constructed
using domain knowledge. These features suggest whether the data
contains suspicious behaviour, such as: a number of failed logins
successfully logged in or not, whether logged in as root, whether a root
shell is obtained, etc. In total, there are 42 features (including the
112
attack type) in each connection record, with most of them taking on
continuous values. The individual features are listed and briefly
described in Table 3.2 to 3.5. Table 2.1 shows the different types of
attacks and their categories:
Table 3.1 Class Labels that Appears in Full KDDCUP99” Dataset
Category KDD Cup 99FULL Dataset
After Removing Duplicate Samples
% rate of Reduction
Dataset Class
Normal 972781 812814 16.44 NORMAL Back 2203 968 56.06 DOS Pod 264 206 21.97 DOS Land 21 19 9.52 DOS Smurf 2807886 3007 99.89 DOS Teardrop 979 918 6.23 DOS Neptune 1072017 242149 77.41 DOS Nmap 2316 1554 32.90 PROBE Satan 15892 5019 68.42 PROBE Ipsweep 12481 3723 70.17 PROBE Portsweep 10413 3564 65.77 PROBE Phf 4 4 0.00 R2L Guess_pwd 53 53 0.00 R2L Ftp_write 8 8 0.00 R2L Imap 12 12 0.00 R2L Spy 2 2 0.00 R2L Multihop 7 7 0.00 R2L Warezclient 1020 893 12.45 R2L Warezmaster 20 20 0.00 R2L Buffer_Overflow 30 30 0.00 U2R Loadmodule 9 9 0.00 U2R Perl 3 3 0.00 U2R Rootkit 10 10 0.00 U2R
Total
4898431 10,74,992 78.05%
113
Table 3.2 Class Labels that Appears in 10% KDDCUP99” Dataset
Category KDD Cup 99 FULL Dataset
After Removing Duplicate Samples
% rate of Reduction
Dataset Class
Normal 97278 87832 9.71 NORMAL
Back 2203 968 54.88 DOS
Pod 264 206 21.97 DOS
Land 21 19 9.52 DOS
Smurf 280790 641 99.77 DOS
Teardrop 979 918 6.23 DOS
Neptune 107201 51820 51.66 DOS
Nmap 231 158 31.60 PROBE
Satan 1589 906 42.86 PROBE
Ipsweep 1247 651 47.79 PROBE
Portsweep 1040 416 60.00 PROBE
Phf 4 4 0.00 R2L
Guess_pwd 53 53 0.00 R2L
Ftp_write 8 8 0.00 R2L
Imap 12 12 0.00 R2L
Spy 2 2 0.00 R2L
Multihop 7 7 0.00 R2L
Warezclient 1020 893 0.00 R2L
Warezmaster 20 20 0.00 R2L
Buffer_Overflow 30 30 0.00 U2R
Loadmodule 9 9 0.00 U2R
Perl 3 3 0.00 U2R
Rootkit 10 10 0.00 U2R Total 4,94,021 145586 70.53%
114
Table 3.3 KDDCUP99 Basic Features of Individual TCP Connections
Feature name Description Type
Duration Length(number of seconds) of the connection continuous
Protocol_type Type of the protocol, e.g. tcp, udp, etc., discrete
Service Network service on the destination, eg. http, telnet, etc., discrete
Src_bytes Number of data bytes from source to destination continuous
Dst_bytes Number of data bytes from destination to source continuous
Flag Normal or error status of the connection discrete
Land 1 if connection is from/to the same host/port;0 otherwise discrete
Wrong_fragment Number of ‘wrong’ fragments continuous
Urgent Number of urgent packets continuous
Table 3.4 Content Features within a Connection Suggested by Domain Knowledge
Feature name Description Type hot Number of ‘hot’ indicators continuous
Num_failed_logins Number of failed login attempts continuous
Logged_in 1 if successfully logged in; 0 otherwise
discrete
Num_compromised Number of ‘compromised’ conditions
continuous
Root_shell 1 if root shell is obtained; 0 otherwise
discrete
Su_attempted 1 if ‘su root’ command attempted; 0 otherwise
Discrete
Num_root Number of ‘root’ accesses Continuous
Num_file_creations Number of file creation operations
Continuous
Num_shells Number of shell prompts Continuous
115
Num_access_files Number of operations on access control files
continuous
Num_outbound_cmds Number of outbound commands in an ftp session
Continuous
Is_hot_login 1 if the login belongs to an ‘hot’ list; 0 otherwise
continuous
Is_guest_login 1 if the login is a ‘guest’ login; 0 otherwise
discrete
Table 3.5Traffic Features Computed Using a Two-second Time Window
Feature name Description Type
count
Number of connections to the same host as the current connection in the past two seconds
Continuous
Note: the following features refer to these same- host connections.
Serror_rate % of connections that have ‘SYN’ errors Continuous
Rerror_rate % of connections that have ‘REJ’ errors Continuous
Same_srv_rate % of connections to the same service Continuous
Diff_srv_rate % of connections to the different services Continuous
Srv_count
Number of connections to the same service as the current connection in the past two seconds
Continuous
Note : the following features refer to these same – service connections
Srv_serror_rate % of connections that have ‘SYN’ errors Continuous
Srv_rerror_rate % of connections that have ‘REJ’ errors Continuous
Srv_diff_host_rate % of connections to different hosts Continuous
116
Table 3.6 Traffic Features computed using a Hundred- second connection windows
Feature name Description Type
dst_host_count*
No.of connections to same host as the current connection in the past two seconds
continuous
dst_host_serror_rate* % of connections that have ‘SYN’ errors continuous
dst_host_rerror_rate* % of connections that have ‘REJ’ errors continuous
dst_host_same_srv_rate* % of connections to the same service continuous
dst_host_diff_srv_rate* % of connections to the different services continuous
dst_host_srv_count**
No. Of connections to the same service as the current connection in the past two seconds
continuous
dst_host_srv_serror_rate** % of the connections that have ‘SYN’ errors continuous
dst_host_srv_rerror_rate** % of the connections that have ‘REJ’ errors continuous
dst_host_srv_diff_host_rate** % of connections to different hosts
117
3.4 Feature Extractions and Pre-processing
The input data to the neural network must be in the range of (0
1) or (-1 1). Hence, pre-processing and normalization [112] of data is
required. The KDDCUP99 format data are pre-processed. Each record
in KDDCUP99 format has 41 features, each of which is in one of the
continuous, discrete and symbolic forms, with significantly varying
ranges. Based on the type of neural nets, the input data may have
different forms and so it needs different pre-processing. Some neural
nets only accept binary input and some can also accept continuous-
valued data. In Pre-processor [6, 24], after extracting KDDCUP99
features from each record, each feature is converted from text or
symbolic form into numerical form. For converting symbols into
numerical form, an integer code is assigned to each symbol. For
instance, in the case of protocol_type feature, 0is assigned to tcp, 1 to
udp, and 2 to the icmp symbol. Attack names were first mapped to one
of the five classes, 0 for Normal, 1 for Probe, 2 for DOS, 3 for U2R and
4 for R2L.
Two features are spanned over a very large integer range,
namely src_bytes [0, 1.3 billion] and dst_bytes [0, 1.3
billion].Logarithmic scaling (with base 10) was applied to these features
to reduce the range to [0.0, 9.14]. All other features were Boolean, in
118
the range [0.0, 1.0]. Hence scaling was not necessary for these
attributes.
3.4.1 Normalization
Pre-processing converts all the symbolic or text forms into
numerical values. The range of values of the different features is not
uniforms discussed above. Some features having large range of values
will influence the performance more than other features having less
range of values. Hence normalization is applied to the features to
convert the range of values to fall between 0 to 1. Different methods
are available to normalize the data as given below.
1. In one of the algorithms [51, 113, 149] each numerical value in
the data set is normalized between 0.0 and 1.0 according to the
following equation:
x = (x-min)/ (max-min)
Where,
x is the numerical value,
MIN is the minimum value for the attribute that x belongs to,
MAX is the maximum value for the attribute that x belongs to.
2. In another algorithm data normalization is done by applying the
formula
1/(1 + xt)
119
Where x is the input data at the time t.
3. For normalizing feature values, a statistical analysis (6) is
performed on the values of each feature based on the existing
data from KDDCUP99 data set and then acceptable maximum
value for each feature is determined. According to the maximum
values, by using the following simple formula (6) normalization of
feature values in the range (0, 1) is calculated.
If (f > MaxF) Nf=1; Otherwise Nf = ( f / MaxF) F: Feature, f:
Feature value
MaxF: Maximum acceptable value for F, Nf: Normalized or
scaled value of F
3.5 Performance Evaluation Metrics
The evaluation metrics used in our proposed method are true
positive (TP), true negative (TN), false positive (FP) and false negative
(FN). Here, true positive indicates the number of correctly classified
attack. A true positive is a sign of properly detecting the occurrences of
attacks in intrusion detection system. True negative indicates the
number of valid records that are correctly classified. A true negative
specifies that the IDS have not made a mistake in detecting a normal
condition. False positive indicates the records that were incorrectly
classified as attacks, whereas in fact they are valid activities. A false
120
positive specifies the wrong detection of a particular attack by IDS. A
false positive is often produced due to lost recognition conditions and it
represents the accuracy of the detection system. False negative
indicates the records that were incorrectly classified as valid activities,
whereas in fact they are attacks. A false negative stipulates that the
IDS is unable to detect the intrusion after a particular attack has
occurred. Based on TP, TN, FP and FN, the performance of our
intrusion detection system is evaluated by: a) Accuracy b) Detection
Rate (DR), c) Failure Analysis Rate (FAR). The accuracy of our system
is obtained by the following expression.
FNFPTNTPTNTPAccuracy
++++
= 3.1
Then, the Detection Rate (DR) is determined based on the expression
given below.
FPTPTPDRateDetectionR+
=)( 3.2
Detection rate shows the probability of abnormal data in the test
samples in detection. The higher Detection Rate indicates that the
algorithm can more accurately reflect the input data anomalies.
121
( ) FPFailure analysis rate FARFP TN
=+ 3.3
Failure analysis rate shows the accuracy of intrusion detection. Lower
FAR indicates that the accuracy of detection is high.
3.6 Summary
This chapter explains about the database used in this thesis, Proposed
Feature Extractions, Pre-processing techniques used for IDS and
performance evaluation metrics such as Accuracy, Detection Rate,
Failure Analysis Rate used for Intrusion Detection System.
122
CHAPTER 4
CLUSTERING BASED INTRUSION DETECTION
4.1 Introduction
Clustering can be considered the most important unsupervised
learning problem. The goal of clustering is to determine the intrinsic
grouping in a set of unlabelled data. Clustering is the process of
segmenting the data, which are similar in some ways to one another. A
good clustering method can be identified by which securing the result
of high intra-class similarity and low inter-class similarity. The
clustering quality fully depends upon the similarity measure which is
used by the method, and ability of finding hidden patterns and its
consequent implementation. Some of the applications of clustering are
pattern recognition, World Wide Web, image processing and spatial
data analysis etc. More than this, clustering is also used for energy
conservation applications. Various clustering algorithms like K-Means
Clustering and Fuzzy C-Means Clustering are available.
4.2 Need for Clustering of data
Cluster analysis or clustering is the task of assigning a set of objects
into groups (called Clusters) so that the objects in the same cluster
are more similar (in some sense or another) to each other than
123
to those in other clusters. Classification and clustering techniques
in data mining are useful for a wide variety of real time
applications dealing with large amount of data. Some of the
applications of data mining are text classification, selective
marketing, medical diagnosis, intrusion detection systems. In
information security, intrusion detection is the act of detecting
actions that attempt to compromise the confidentiality, integrity or
availability of a resource. Intrusion detection systems are software
systems for identifying the deviations from the normal behavior
and usage of the system. They detect attacks using the data
mining techniques- classification and clustering algorithms.
Being more generalized and having a wider scope as compared
to misuse detection systems, most of the current techniques focus
on anomaly detection systems. Data mining approaches can be
applied for both anomaly and misuse detection. Clustering techniques
can be used to form clusters of data samples corresponding to the
normal use of the system. Clustering based techniques can detect
new attacks as compared to the classification based techniques.
4.3 Clustering Algorithms
The input dataset given to the intrusion detection system
normally comprise huge quantity of data which makes the processing
124
very complex, hectic and time consuming. Executing this large number
of data can also lead to having poor results by the increase of errors.
Hence, it will have marked effect on the efficiency of the system and
ultimately leading to reduced quality intrusion detection system. To
compact this problem, clustering technique is employed prior to
classification. In this work some existing clustering techniques such as
K Means Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and
KFCM are discussed, And also proposed Fuzzy Bisector-Kernel Fuzzy
C-means clustering (FB-KFCM) their results are discussed.
4.3.1 K Means Clustering
K-means clustering is one of the simplest unsupervised
clustering algorithms. The algorithm takes input parameter ‘k’ and
partition the ‘n’ dataset into k cluster so that the intra-cluster similarity
is high and intercluster similarity is low. ‘K’ is a positive integer number
given in advance. K means clustering takes less time as compared to
the hierarchical clustering and yields better results.
With the help of clustering training dataset is clustered into 5
dataset wherein 4 dataset will be a type of intrusion called attack
dataset and one with normal data type called normal dataset. Here are
the four steps of the clustering algorithms:
1) Define the number of clusters K.
125
2) Initialize the K cluster centroids. This can be done by arbitrarily
dividing all objects into K clusters, computing their centroids, and
verifying that all centroids are different from each other.
Alternatively, the centroids can be initialized to K arbitrarily
chosen, different objects
3) Iterate over all objects and compute the distances to centroids of
all clusters. Assign each object to the cluster with the nearest
centroid.
4) Recalculate the centroids of both modified clusters.
5) Repeat step 3 until the centroids do not change any more.
A distance function is required in order to compute the distance
(i.e. similarity) between two objects. The most commonly used distance
function is the Euclidean one which is defined as:
d(x, y) = √∑ (xi − yi)2 (4.1)
Where x = (x1 . . . xm) and y = (y1…ym) are two input vectors
with m quantitative features. In the Euclidean distance function, all
features contribute equally to the function value. However, since
different features are usually measured with different metrics or at
different scales, they must be normalized before applying the distance
function.
126
4.3.2 Fuzzy K Means Clustering
The traditional k-means clustering algorithm suffers from serious
drawbacks like difficulty in finding the correct method for the cluster
initialization, making a correct choice of number of clusters (k).
Moreover k-means is not efficient for overlapped data set. There have
been many methods and techniques proposed to address these
drawbacks of k-means. Fuzzy k-means is one of the algorithms which
provide better result than k-means for overlapped dataset.
Fuzzy k-means was introduced by Bezdek [5].The fuzzy k-
means algorithm is also called fuzzy c-means. Unlike naive k-means
which assigns each data point completely belonging to one cluster, in
fuzzy c-means each data point has the probability of belonging to a
cluster. This allows data point of data set X to be a part of all centres of
set C.
For example, points on the edges of the clusters might belong to
a cluster with lesser degree than those data points belonging to the
same cluster at its centre. This algorithm is mainly used for datasets in
which the data points are between the centres. The algorithm works on
the objective to minimize the following function,
2
1 1( , ) n k mij i ji jF X C u x c
= == −∑ ∑ (4.2)
127
Here m is any real number greater than 1.Uij, is the degree of
membership of data point xi to the cluster centrecj with the limitation
that 1and 1 .kij ijju o u i
=≥ = ∀∑ Iterative procedure of optimizing the
objective function F(X, C) by updating the degree of membership of the
data point xi to the centrecj and the cluster centrecj results in the
clustering of the data.
21
121
1
k mi jj
k mi jj
ijx c
x cu
−−
=−−
=
−
−
=∑
∑
(4.3)
1
1
.k mij ii
k miji
ju x
uc =
=
= ∑∑
(4.4)
As the value of m increases the algorithm becomes fuzzy. At m
around 1 the sharing centres among data points becomes less and it
behaves like standard k-means [16]. For example consider a one
dimensional dataset as depicted in Figure 4.1.
Figure 4.1. Input mono dimensional data
128
We could find two clusters A and B based on the data points
associations. On applying k-means to the above dataset, each data
point is associated to the centroid close to it as depicted in Figure 4.2
Figure 4.2 Clustered using k means
Figure 4.3. Clustered Using Fuzzy K Means
129
If fuzzy k-means clustering approach is used on the dataset, the
data point does not exclusively belong to a cluster instead it is in the
middle way. There is a smoother line to indicate that every data point
may belong to more than one cluster as in Figure 4.3. More information
on this example can be found in [100].
4.3.3Fuzzy C-Means
The proposed FB-KFCM is an extension to KFCM which itself is
an extension to normally used FCM. Let the input data is represented
by, number of input data by and be a real number greater than
1representing the weighting co-efficient. The centre of the cluster is
represented by and the number of clusters by .Let represents the
degree of membership of in the cluster. Fuzzy C- Means (FCM)
clustering has the minimization objective function defined in eq.4.5,
2
1 1|||| ji
i
Nc
jij xzF −=∑∑
= =
ηϖ
ϖ µ (4.5)
In the process, initially arbitrary data points are assigned as
centroids and subsequently, membership values of the data points with
respect to the centroids are found out. The generalized formula for
finding membership function value is given in eq.4.6,
130
∑=
−
−−
=Nc
m mi
iiij cx
cz1
12
||||||||1
ϖµ
(4.6)
Afterwards, the updated centroid values are computed with the
use of found out membership values. The centroid updating equation is
given in eq.4.7,
∑∑==
=η
ϖη
ϖ µµ11 i
ijii
ijj zx (4.7)
Based on the updated centroid values, membership values are
again found out. This process is repeated in a loop process to have the
final clusters. The loop contains updating the membership value and
centre of the cluster centres. The loop condition is defined in eq.4.8,
}|{|max 1 λµµ <−= mij
mijijimum
(4.8)
Here, λ has the value between 0 and 1. Hence, FCM would
converge to a local minimum or a saddle point of ϖF .
4.3.4 KFCM
The negative aspect of FCM is the fact that it does not come up
with high-quality accurate results. This is overcome with the use of BF-
KFCM. BF-KFCM employs KFCM with additional steps. KFCM differs
from normal FCM with the use of kernel functions which yield better
131
results. Hence in KFCM, though the process is same as that of FCM, it
differs in the objective function and the updating equations.
In KFCM, input data (z) is mapped into a higher dimensional
space (S) represented by non-linear feature map function
Zzz ∈→ )(: ϕϕ . The objective function of KFCM is given by eq.4.9,
2
1 1||)()(|| ji
i
Nc
jij xzF ϕϕµ
ηϖ
ϖ −=∑∑= =
(4.9)
Where,
),(2),(),(||)()(|| 2jijjiiji xzGxxGzzGxz −+=−ϕϕ (4.10)
Here, )()(),( babaG Tϕϕ= which is the inner product kernel function
and in our case, we are considering Gaussian kernel function. Hence,
we have:
1),(,),( 0||||||||
2
2
2
2
====−
−−
−eeaaKhenceebaG
aabaσσ (4.11)
),(22||)()(||,1),(),( 2jijijjii xzGxzxxGzzG −=−∴== ϕϕ (4.12)
Hence, the objective function can be rewritten as in eq.4.13,
)],(1[21 1
jii
Nc
jij xzGF −= ∑∑
= =
ηϖ
ϖ µ (4.13)
132
Minimizing the objective function with respect to ijµ , we get the
updating equations for finding membership value ijµ and centroids jx is
given in eq.4.14,
∑
∑
∑ =
=
=
−
−
=
−
−= η
ϖ
ηϖ
η ϖ
ϖ
µ
µµ
1
1
1
)1(1
)1(1
),(.
),(.,
),(1(1
),(1(1
ijiij
iijiij
j
m mi
jiij
xzG
zxzGx
xzG
xzG (4.14)
4.3.5 Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM)
In FB-KFCM, fuzzy bisector is incorporated into the KFCM to
obtain better and more accurate results. Fuzzy bisector proceeds with
the predefined rules and splits the selected cluster into two. Selection
of the cluster is based on the parameters of Minimum Squared Error
(MSE) and number of data points in the cluster. The cluster formation
is carried out in various stages and in each stage; one existing cluster
is further divided into two clusters. Let the input dataset be
represented by },...,,{ 21 Ndzzzz = , where Nd is the number of input data.
After clustering the data would be grouped to form clusters represented
by },...,,{ 21 NFCFCFCFC = , where N is the number of clusters. Each
cluster )0( NiFCi ≤< would have certain data ii FCz ∈ from the input data
set. Let the data inside the thi cluster be represented by
133
},...,,{ 21 ncii riririFC = , where nci is the number of data in the thi cluster.
Illustration of proposed BF-KFCM clustering is given in figure 4.4.
Figure 4.4: Illustration of FB-KFCM clustering technique
The process of forming the final clusters is carried out in various
stages. If the numbers of clusters are to be formed is N, then FB-
KFCM will consist of N+ 1 stages. In stage 1, the input data is split into
two clusters with the use of KFCM. Let the input data be represented
as Z, the formed clusters as A1 and A2. In the next stage, a particular
cluster is taken and further divided to form two more clusters so as to
make 3 clusters in total. Selection of cluster which is to be divided
134
using KFCM is based on certain rules. For rule formation, two
parameters of MSE and number of data points in the respective cluster
are found out.
Mean Squared Error (MSE) for a cluster is found out by finding
the Euclidean distances between the data points and the centroid . Let
the data points in the thi cluster be represented by kdi and let the
number of data points in the cluster be Ni , centroid of thi cluster be
represented by ci then MSE is given in eq.4.15,
∑=
−=Ni
kki cidi
NiMSE
1
2||||1 (4.15)
By computing the MSE and number of data points for each of the
cluster A1 and A2, selection of the cluster to be split is made. The
selection condition is that the cluster should have maximum number of
points and minimum MSE. Let the number of data points in A1 and A2
be represented by 1NA and 2NA . Let the MSE value of A1 and A2 be
represented by 1MA and 2MA . Hence, the conditions can be written as
in eq.4.16 and eq.4.17,
1),21()21( ASelectMAMAANDNANAIf <> (4.16)
2),12()12( ASelectMAMAANDNANAIf <> (4.17)
135
In other cases, arbitrary selection is carried out between A1 and
A2. In our illustration, we have chosen A1 and are split to form B1 and
B2 by the use of KFCM. Hence, the clusters in consideration are A2,
B1 and B2. Subsequently, in stage 2, one among the three clusters is
selected and the selected cluster further divided with the use of KFCM.
The selection of the cluster to be divided is based on MSE and number
of data points. Let the number of data points in B1 and B2 be
represented by 1NB and 2NB . Let the MSE value of B1 and B2 be
represented by 1MB and 2MB . The selection is based on the following
conditions:
)2,1,1(2)2,1,1(2,2 MBMBMAMinimumMAANDNBNBNAMaximumNAIfASelect ==
)2,1,1(1)2,1,1(1,1 MBMBMAMinimumMBANDNBNBNAMaximumNBIfBSelect ==
)2,1,1(2)2,1,1(2,2 MBMBMAMinimumMBANDNBNBNAMaximumNBIfBSelect ==
For other cases, any of the three clusters is selected. In our
illustration, we have selected A2 and are divided to clusters C1 and
C2. Hence, the clusters in consideration are B1, B2, C1 and C2. In the
third stage, respective cluster to be divided by KFCM is found out as in
the earlier stages. Generalizing, suppose the clusters in the thi stage
are represented as KCCC ,...,, 21 . The number of data points in the
clusters is represented as KNNN ,...,, 21 and MSE of the clusters are
136
represented as KMMM ,...,, 21 , selection of the cluster which is to be
divided can be defined by the rule:
),...,,(),...,,(, 2121 KiKii MMMMinimumMANDNNNMaximumNIfCSelect ==
In the illustration example, C2 is selected in stage 3 and divided
to form D1 and D2. In stage 4, B2 is selected and subsequently, the
process is repeated to have the required clusters. The process of
dividing the selected cluster by the use of KFCM is carried out for all
the N stages to form 1+N clusters represented in eq.4.18,
)1(0; +≤< NiFCi . (4.18)
After having the required number of clusters, the centred from
each of cluster is calculated and is taken for further process. That is
instead of all the data inside the cluster, only the centroid is taken and
given to learning process. As all data points inside the cluster are more
or less the same, taking centroid will serve the purpose of representing
all data inside a cluster. This would lessen the time of computation in
further processes and also would reduce the complexity and risks. Let
the data inside the thi cluster be represented by },...,,{ 21 ncii riririFC = .
Hence the centroid ( iCen ) of thi cluster is found out in eq.4.19,
137
nci
riCen j
j
i
∑= . (4.19)
Where, ri is the represented ith cluster.
Hence, we have converted to large bulky dataset into small number of
data for better handling, learning and easier computation.
4.4 Classification Module The centroids obtained after the clustering process are used for the
learning or training process of Bayesian Neutral Network. The input to
the Bayesian Neutral Network would be centroids of the clusters given
in eq.4.20,
)1(0; +≤< NiCeni . (4.20)
4.4.1 Neural Network
Artificial Neural Networks provide a powerful tool for
classification and has been used in a broad range of areas. The latest
enormous research activities in neural classification have recognized
that neural networks are a gifted substitute for a variety of traditional
classification methods. The benefit of neural networks lies in the
subsequent theoretical facets. First, neural networks are data driven
self-adaptive methods in which they can fine-tune themselves to the
data exclusive of any clear specification of functional or distributional
138
form for the unique model. Second, they are universal functional
approximates in which neural networks can approximate whichever
function with random accuracy. Neural networks are nonlinear models,
which makes them stretchable in modelling real world intricate
relationships. Neural networks are able to approximate the subsequent
probabilities, which offer the basis for setting up classification rule and
performing statistical analysis.
Figure 4.5: Block diagram of the Neural Network
In general, the neural network consists of three layers named as
input layer, hidden layer and the output layer. The neural network
works making use of two phases, one is the training phase and the
other is the testing phase. In training phase, the network is trained
under large data base. In our case, the centroids found out after the
139
clustering is fed as the training data. Initially, the nodes are given
random weights. As the output is already known in the training phase,
the output obtained from the neural network is compared to the original
and weights are varied using algorithms so as to reduce the error.
Normally back-propagation algorithms are employed in Neural
Networks. In the testing phase, the input test data is fed to the trained
neural network having particular weights in the nodes and the output is
calculated so as to find if intruded or not. Figure 4.5 shows the general
block diagram of the neural network.
4.4.2 Bayesian Neural Network
Inclusion of Bayesian concept has the advantages of better
learning for Neural Networks. Bayesian based learning is based on
two properties. One is that background knowledge is utilised in
selecting prior probability distribution for model parameters. Second is
the fact that prediction are made with respect to the posterior
parameter distribution obtained by updating of the prior function. These
two properties are in built into the Neural Network to have the BNN.
Considering a single hidden layer based Neural Network, we can
see that the output can be mathematically written in eq.4.21,
∑+=k
kkiii xhbxy )()( ω
(4.21)
140
∑+=j
jjkik xahxhWhere ϖ)(tan)(, (4.22)
Here x represents the input vector, )(xyi denotes the output
value function, kiω gives the weight from hidden layer k to output i and
jkϖ gives the weight from input j to hidden layer k . The network can be
used to define probabilistic model for classification. This is carried out
by using the network output to define the target iz , given the input
vector x . For classification, where target is a single discrete value for
possible class outputs, the probability can be defined in eq.4.23,
∑==
j
xy
xy
j
i
eexizP )(
)(
)|( (4.23)
The bias and the weights present in the Neural Network are
based on the training inputs which contains the input values and the
corresponding output values. This can be represented by:
nizx ii <<0;),( )()( where, n is the total number of inputs. The weights
and the bias are updated based on the error in the network. This error
is computed as the squared sum of difference between the network
outputs and the target outputs. The updating is such way as to
minimize the error in the system. This minimization is equivalent to
likelihood estimation for Gaussian noise method where minus log of
likelihood is proportional to the sum of squared error.
141
In Bayesian approach to Neural Network, the objective is to find
the predictive distribution for the target values in a new test case, given
the input for that case, the input and the targets in the training cases.
Then, the predictive distribution can be written in eq.4.24,
θθθ dzxzxPxzPzxzxxzP nnnnnnnn )),)...(,(,().,|()),)...(,(;|( )()()1()1()1()1()()()1()1()1()1( =+=+ ∫= (4.24)
Where, θ gives the network parameters like weight and bias.
Posterior density for the parameters is proportional to product of prior
and likelihood function which can be represented in eq.4.25,
∏=
=n
j
jjnn xzPzxzxL1
)()()()()1()1( ),|()),)...(,(,( θθ
(4.25)
Hence, the learning is carried for all input data )1(0; +≤< NiCeni .
Once the learning process is carried out where the test data is given as
input to the trained network which outputs if the data is intruded or not.
4.5 Results and Discussions
The proposed technique is implemented using JAVA
PROGRAMMING on a system having 8GB RAM and 3.2 MHz
processor. To evaluate the performance of the proposed technique, we
used KDD CUP 99 DATASET for testing and evaluation. The
sophisticated version of DARPA dataset which enclose only network
data is named as KDD dataset [137, 138]. KDD training dataset
consists of comparatively 4,900,000 single correlation vectors where
142
each single connection vector consists of 41 features and is marked as
either normal or an attack, with exactly one particular attack type [139].
Four categories in which they fall is defined by: a) in a connection, the
first category consists of the inherent features which encompass the
primary features of every individual TCP connections. b) The content
features recommended by domain knowledge are employed to
calculate the payload of the original TCP packets. c) The same host
features monitor the familiar connections that have the identical target
host as present connection in past two seconds inside a connection
and the statistics related to the protocol behaviour, service, etc are
estimated. d) The related identical service features analyse the
connections that have the same service as the existing connection in
past two seconds.
Table 4.1: Accuracy table for Case 8:2
Case 8:2 K Means
+ Bayesian Network
FKM + Bayesian Network
FCM+ Bayesian Network
KFCM + Bayesian Network
BF-KFCM+
Bayesian Network
Cluster size=200 83.9201 85.3210 86.7189 93.2321 96.5506
Cluster size=180 83.9732 85.6684 86.9022 90.3874 93.4678
Cluster size=160 82.9934 85.3444 86.9355 90.3210 92.4013
Cluster size=140 83.7643 85.7021 86.9355 92.3542 93.4678
143
Table 4.2: Accuracy table for Case 7:3
Case 7:3
K Means +
Bayesian Network
FKM + Bayesian Network
FCM+ Bayesian Network
KFCM + Bayesian Network
BF-KFCM+
Bayesian Network
Cluster size=200 82.8711 84.2111 86.7141 92.2021 94.4124
Cluster size=180 82.7021 84.0014 86.7141 94.0824 96.5563
Cluster size=160 82.6430 84.2311 86.7141 94.0210 96.7341
Cluster size=140 83.3403 84.4001 86.7141 90.4201 92.4017
Table 4.3: Accuracy table for Case 9:1
Case 9:1
K Means +
Bayesian Network
FKM + Bayesian Network
FCM+ Bayesian Network
KFCM + Bayesian Network
BF-KFCM+
Bayesian Network
Cluster size=200 84.7602 85.8013 86.7711 92.8732 93.0023
Cluster size=180 84.8231 85.8724 86.7378 92.1532 93.4022
Cluster size=160 83.5210 85.4921 86.7452 90.9710 91.936
Cluster size=140 84.2318 85.7611 86.7378 91.7342 92.6015
144
Figure 4.6 Accuracy Plot for Case 8:2
Figure 4.7 Accuracy Plot for Case 7:3
75
80
85
90
95
100
Cluster size=200
Cluster size=180
Cluster size=160
Cluster size=140
K Means + Bayesian Network
FKM + Bayesian Network
FCM+ Bayesian Network
KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
75
80
85
90
95
100
Cluster size=200
Cluster size=180
Cluster size=160
Cluster size=140
K Means + Bayesian Network
FKM + Bayesian Network
FCM+ Bayesian Network
KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
145
Figure 4.8 Accuracy Plot for Case 9:1
Table 4.4: Average Accuracy Table
Case K Means
+ Bayesian Network
FKM + Bayesian Network
FCM+ Bayesian Network
KFCM + Bayesian Network
BF-KFCM+
Bayesian Network
Case 8:2 83.6628 85.5009 86.8730 91.5737 93.9719
Case 7:3 82.8891 84.2109 86.7141 92.6814 95.0261
Case 9:1 84.3340 85.7317 86.7480 91.9329 92.7355
78
80
82
84
86
88
90
92
94
Cluster size=200
Cluster size=180
Cluster size=160
Cluster size=140
K Means + Bayesian Network
FKM + Bayesian Network
FCM+ Bayesian Network
KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
146
Figure 4.9: Average Accuracy Plot
Comparison of the existing technique such as K Means
Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and KFCM are
discussed, And also proposed Fuzzy Bisector-Kernel Fuzzy C-means
clustering (FB-KFCM) their results are discussed. Table 4.1 and Figure
4.6 gives the accuracy values and Plot for Case 8:2 for various cluster
size, Table 4.2 and Figure 4.7 gives the accuracy values and Plot for
Case 7:3 for various cluster size, Table 4.3 and Figure 4.8 gives the
accuracy values and Plot for Case 9.1 for various cluster size.
Accuracy values are taken for different cluster sizes of 140,160,180
and 200.In all cases the proposed technique has achieved better
accuracy value when compared with existing technique.
76
78
80
82
84
86
88
90
92
94
96
Case 8:2 Case 7:3 Case 9:1
K Means + Bayesian Network
FKM + Bayesian Network
FCM+ Bayesian Network
KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
147
Average accuracy value in Case 8:2 for existing technique K
Means Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and
KFCM are 83.66%, 85.50%, 86.87%, 91.57% respectively and for
proposed Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM)
the accuracy is 93.97%.
Average accuracy value in Case 7:3 for existing technique K
Means Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and
KFCM are 82.89%, 84.21%, 86.71%, 92.68% respectively and for
proposed Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM)
the accuracy is 95.03%.
Average accuracy value in Case 9:1 for existing technique K
Means Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and
KFCM are 84.33%, 85.73%, 86.75%, 91.93% respectively and for
proposed Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM)
the accuracy is 92.74%.
According to the results in table 4.4 and Figure 4.8 the proposed
Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM) attained
high accuracy of 93.91%. These values show the efficiency of the
proposed technique by achieving better accuracy values.
148
Table 4.5: Comparative Analysis
Technique Accuracy
KDD 99 winner 90.2
PNrule 85.6
Multi-class SVM 85.9
Layered Conditional Random Fields 90.1
Columbia Model 89.7
Decision Tree 72.4
BSPNN 92.3
BF-KFCM+ Bayesian Network 93.1
Figure 4.10 Accuracy plot for Comparative Analysis
0102030405060708090
100
Accuracy
Accuracy
149
The proposed technique is compared with other techniques in
the area. The comparison values are given in table 5.4 and figure 4.9.
Comparison is made respect to KDD 99 winner, PN rule, Multi-class
SVM, Layered Conditional Random Fields, Columbia Model, Decision
Tree and BSPNN. It is inferred that the proposed technique has
performed well by obtaining high accuracy value.
4.6 Summary
In this Chapter some existing clustering techniques such as K
Means Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and
KFCM were discussed and implemented. To evaluate the performance
of the proposed technique, we used KDD CUP 99 DATASET for
testing and evaluation. Based on the analysis it is observed that the
proposed Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-KFCM)
performs better than other methods in terms of accuracy which attains
an average high accuracy of 93.91% when compared with other
techniques
150
CHAPTER 5
HYBRID INTRUSION DETECTION SYSTEM
5.1 Introduction
Intrusion detection is the challenge to monitor and probably
prevent the attempts to intrude into or otherwise compromise your
system and network resources. One of the recent methods for
identifying any abnormal activities staging in a computer system is
carried out by Intrusion Detection Systems (IDS) and it forms a major
portion of system defence against attacks. In the literature, various
techniques for Intrusion Detection have been proposed in recent years.
One of the methods proposed is Intrusion Detection System (IDS)
based on Fuzzy Bisector- Kernel Fuzzy C-means clustering technique
and Bayesian Neural Network. In the previous chapter, the
dimensionality of the data played a major role in obtaining the better
detection rate. In order to overcome the dimensionality issue, feature
selection would be right choice to improve the detection rate without
compromising the computation time. In this chapter LDA+CS (Linear
Discriminant Analysis + Cuckoo search) is developed by combining
LDA and CS. LDA is a commonly used technique for dimensionality
reduction. Here, CS will be incorporated with the intention of assisting
the ill-conditioning issue by selecting an “optimal” subset of features
151
that result in an intermediate lower dimensional subspace. Then,
feature reduced dataset is grouped into clusters with the use of Fuzzy
Bisector- Kernel Fuzzy C-means clustering (FB-KFCM). In the
classification step, the centroids from the clusters are taken for training
the Bayesian Neural Network. For the online identification of intrusion
detection node, test data is given to the trained network, which outputs
if the data is intruded or not. The entire system will be applied to
medical sensor network to find the intrusion behaviour by simulating
the networks in JAVA. Finally, the performance of the system will be
analysed using KDD CUP 99 dataset in terms of accuracy.
5.2 Need for Hybrid Approach
In the past, data mining techniques such as using
association rules were suggested to build IDS. They have
distinguished the differences between single-connection and multi
connection attacks. Both signature- based and anomaly-based
IDSs are sensitive to the attack characteristics, system
training history, services provided, and underlying network
conditions. Data mining techniques are also used to build
classification models from labeled attacks. Intrusion detection must
be designed to monitor the connection features at the network,
transport, and application layers.
152
In this work, we propose the Hybrid Intrusion Detection System
architecture. For signature based system we define features from the
observations as well as the previous labels and perform sequence
labeling in the observation. This setting is sufficient for modeling the
correlation between different features of an observation.
For anomaly-based system we investigate user patterns, such
as profiling the programs executed daily or the privileged processes
executed with access to resources that are inaccessible to ordinary
user by collecting the volatile data from the system. Then we train our
system by using conditional random fields, which reduces the false
alarm rate Hybrid intrusion detection is a novel kind of model
combining the advantages of anomaly based intrusion detection and
signature based intrusion detection. Intrusion and anomalies are two
different kinds of abnormal traffic events in an open network
environment. An intrusion takes place when an unauthorized access of
a host computer system is attempted. An anomaly is observed at the
network connection level. Both attack types may compromise valuable
hosts, disclose sensitive data, deny services to legitimate users, and
pull down network based computing resources. The intrusion detection
system (IDS) offers intelligent protection of networked computers or
distributed resources much better than using fixed-rule firewalls.
153
Existing IDSs are built with either signature-based or
anomaly-based systems. Signature matching is based on a misuse
model, whereas anomaly detection is based on a normal use model.
The design philosophies of these two models are quite different, and
they were rarely mixed up in existing IDS products from the security
industry. The signatures are manually analyzed by security experts
analyzing previous attacks. The collected signatures are used to match
with incoming traffic to detect intrusions. These are conventional
systems that detect known attacks with low false alarms. However, the
signature-based IDS cannot detect unknown attacks without any pre
collected signatures or lack of attack classifiers.
5.3 Application of Hybrid Approach
A hybrid intelligent system uses the approach of integrating
different learning or decision-making models. Each learning model
works in a different manner and exploits different set of features.
Integrating different learning models gives better performance than
the individual learning or decision-making models by reducing their
individual limitations and exploiting their different mechanisms.
In a hierarchical hybrid intelligent system each layer provides
some new information to the higher level. The overall functioning of the
system depends on the correct functionality of all the layers. It is
154
used to filter out a large number of packet records using the
anomaly detection module, and second detection can perform with
the misuse detection module if the packet is determined to intrusion.
Hence, it efficiently detects intrusion and merges the outputs
of the misuse detection modules and anomaly detection with a
decision making module. Hybrid approach, find out intrusion, and
tells the type of attack. The output of the decision making module is
then send to an administrator for follow-up, it is not only reduces the
threat of attack in the system, but also helps user to handle and correct
the system further with hybrid detection. In HIDS, the performance of
the misuse detection module is evaluated.
5.4 Locality Preserving Cuckoo search Algorithm
Intrusion detection system is a device used to identify whether
the input data is intruded or not. The process is done by classifying the
huge amount of input data in to different groups or classes by
clustering. In our proposed hybrid intrusion detection system, the input
dataset consists of large number of data with various attacks. So,
classifying this huge dataset is difficult and time consuming and there
is also a possibility of increasing the error rate. The different attacks
found in our datasets are, DOS (Denial of Service attack), R2L
(Remote to Local (User) attack), U2R (User to Root attack) and
155
Probing Surveillance. To overcome the drawbacks of the previously
donetworks;we have introduced a new method called LDA-CS here,
which will improve the detection rate of our intrusion detection system.
Figure 5.1: Proposed Intrusion Detection System
156
The proposed method consists of two phases, namely, the
training phase and the testing phase. For training and testing, we have
used the KDD cup 99 dataset in our method. The general architecture
of our proposed method is shown in Fig.5.1.
5.4.1 Training Phase
The training phase consists of various processing stages such
as the input data set is clustered and classified using various
techniques like LDA-CS, FB-KFCM and Bayesian Neural Network.
Here, we have used the KDD cup 99 dataset that is huge in size. It
consists of approximately 4,900,000 single connection vectors, each of
which contains 41 features. In general, the classifier delivers more
accurate results only while using complete linear feature space. But,
the direct application of this dataset to the classifier has various
drawbacks such like the classifier becomes biased due to architecture
complexity and training as well as testing efficiency decreases. It also
results in increasing memory consumption rate and computational cost.
In order to overcome these problems, it is best to adopt some
approaches for selecting the optimal subset of features from a linear
space of features. Hence, Cuckoo Search algorithm that is commonly
called as LDA+CS is applied in this work to select the optimal subset of
linear feature space.
157
The LDA+CS consist of process as follows:
(i) Initialization
(ii) Fitness calculation and Nest update
Figure 5.2 Fixed Nests
5.4.1.1 Initialization
In the cuckoo search algorithm, a fixed host nest is built at a size
of Mn× . Here n is the number of nest and M is the number of
attributes. The fixed host nest is an index to select the relevant
features from the original dataset. Here, the class for each nest is not
defined in the fixed host nest. So, in order to determine the class for
each host nest based feature, we have used a classifier called LDA
here. It is used to identify whether the host nest based data is intruded
158
or not. The fixed nest built is shown in Fig.5.2.Further initialization
process is done based on the fixed host nest.
Figure 5.3: Nest formation from original dataset
5.4.1.2 Fitness Calculation and Nest update
In this stage, the fixed host nest built is randomly assigned with a
probability of 1s and 0s. After this selection, both the 1s and 0s based
relevant features from the original dataset will increase the
computational complexity. So, based on the cuckoo search algorithm
the nest with 1s is selected and the random nest with 0 is neglected.
This results in dimensionless feature subset based on the neglected
nest with 0s. The selected subset contains features that are relevant to
the selected host nest with a size of mn ×1 where, m is the dimension
159
of the reduced subset. Finally, a dimension reduced N number of
training feature subset is obtained from an original set (N) where, the
size is MN < . The general data set and the host nest obtained is
shown in Fig.5.3.
Here, the dimension reduced subset contain only valuable
information and has some data about some of the other features.
Furthermore, the subset with relevant feature is given to LDA for
classification. The LDA has various stages of processing, which doesn’t
change the location but only tries to provide more class separately and draws
a decision region between the given classes. The input to LDA is N
dimensional training subset that belongs to different class v with Ni samples
in the ith class. The first stage of the LDA is to group the subset of data into
two different classes, which are attack or not. For each subset, the within-
class distance and the between-class distance is computed for two different
classes. For N number of training datasets, the mean vector and the
covariance matrix is calculated for each class of the complete data set. It is
given as in the equation below.
∑=
=n
iiNN
1 (5.1)
Where, N represent the total number of training subset were, Ni
represents the number of training datasets in class i. Naturally, the
number of classes is i. The scatter matrix is calculated by Eigen
160
decomposition that is applicable to high dimensional data. The within
class and between class scatter matrix calculated is represented as
WC and BC. The scattering matrixes are represented by the equation
below.
The Between class scatter matrix BC is represented as:
Tii
n
iiC vvvvNNB ))((/
1−−= ∑
= (5.2)
The within class matrix WC is represented as:
Tiji
ij
n
i
N
jC vzvzNW
i
))((/1 )()(
1 1−−= ∑∑
= = (5.3)
Here, the mean for the ith class, Vi is represented by the equation:
∑=
=iN
j
ijii zNv
1
)(/1 (5.4)
Similarly, the total mean of the class for the whole dataset is
represented by the equation given below.
∑∑= =
=n
i
N
j
ij
i
zNv1 1
)(/1 (5.5)
Finally, a discriminant function is determined based on the following
equation.
161
])[( 1CCLDA BwtrY −= (5.6)
Fitness for each training subset is obtained based on the LDA
classifier. A dimension reduced subset of feature is obtained and
applied to LDA in single iteration. Likewise, the process is repeated
until the global best solution is obtained. Here, N number of training
subset is given as input to the LDA classifier and n number of fitness is
obtained for each subset. The N number of fitness functions
determined for the fixed host nest is, Nfffff ,.....,, 321= .Among this,
the best fitness is found and replaced as X best. Finally, the accuracy of
our system is determined based on the ratio of the total number of
correct predictions to the actual data set size. The fitness function f is
calculated by the equation given below.
Accuracyfitness −=1 (5.7)
In order to generate a new solution, Levy Flight is performed that
provides a random walk. The new solution )1( +ty is determined based
on the equation given below, but maintain the current best.
)()()1( λα Levyyy tt ⊕+=+ (5.8)
162
Figure.5.4: LDA+CS Flow Diagram
163
Where, 0>α is the step size. But in most cases, we use
1=α .This has an infinite variance with an infinite mean. Here, the
consecutive steps of a cuckoo essentially form a random walk process
that obeys a power-law step-length distribution with a heavy tail. In
addition, a fraction of the worst nests can be abandoned, so that the
new nests can be built at new locations by random walks and mixing.
The mixing of the solutions can be performed by random permutation
according to the similarity/difference to the host eggs. The flow
diagram of the designed LDA+CS is shown in Fig.5.4.The optimal
dimensionality reduced features are further clustered using a technique
called FB-KFCM (Fuzzy Bisector-Kernel Fuzzy C-Mean).
5.5 Clustering using FB-KFCM
Clustering is one of the common methods used to group the
optimal features obtained from LDA+CS. Some of the clustering
methods used previously are not suitable for large datasets. So, we
have proposed a new method for effective clustering by incorporating
Fuzzy Bisector with fuzzy C-means clustering called as FB-KFCM
here. In order to obtain better results, here we have used a modified
technique by incorporating Fuzzy Bisector called as FB-KFCM. The
general operation of the newly incorporated fuzzy bisector is based on
the optimal features and Minimum Squared Error (MSE) parameters.
164
The initial stage of the fuzzy bisector is that it initially selects a
cluster based on the above parameters and is divided in to two using
fuzzy c-means technique. The process has several stages and each
contains single bisection, which increases the number of clusters by
one. The input dataset to the FB-KFCM algorithm is represented as:
},...,,{ 21 dxxxX = where, d is the size or dimension of the dataset.
Further, the input dataset is clustered and grouped into n number of
clusters as represented below.
},...,,{ 21 NCCCQ = (5.9)
Here, each grouped cluster has data ix belonging to iQ . Also, the
data inside the ithcluster iC is represented as: },...,,{ 21 ki DDDC = where,
k is the number of data in the ithcluster. Each cluster has a group of n
number of data. The proposed FB-KFCM is shown in Fig.5.5.
The FB-KFCM clustering includes N+ 1 stages and in each
stage, the input data is divided into two clusters by KFCM algorithm.
For each input data X, two clusters are formed and are further divided
in to two clusters such as A and B. In the next stage, one among the
two clusters is taken and its divided into two based on the KFCM. The
total number of clusters in this stage is three and likewise, there are m
165
number of clustering stages and are grouped in to n clusters as
denoted by the following equation.
},...,,{ 21 NCCCQ = (5.10)
Then, for each grouped cluster, the Mean Square Error is detected
based on the Euclidian distance between the data points and centroid.
The MSE of the ith cluster is represented by the equation given below.
∑=
−=Ni
kki ciC
NiMSE
1
2||||1
(5.11)
Finally, for N number of clustering stages, the data points in the
clusters is represented as KDDD ,...,, 21 and MSE of the clusters are
represented as KEEE ,...,, 21 . Each stage of the process is carried out by
the KFCM, which has totally N+ 1 stages. Hereafter, the centroid for
each cluster is to be calculated for further process. The centroid based
classification has various advantages such as less time consumption
and reduced complexity. The centroid of the ithcluster is calculated by
the equation given below.
K
DW j
j
i
∑=
(5.12)
166
Based on the above equation, the centroid for each cluster is
calculated and given to the classification process.
Figure 5.5: FB-KFCM
5.6 Classification using Bayesian Neural Network
Classification in intrusion detection is to train the centroid based
grouped data obtained from FB-KFCM. The centroid of each cluster is
trained by classifier to identify whether the input data is intruded or not.
167
In this proposed system, the Bayesian Neural Network is used for
better classification. Bayesian neural network is the improved version
of artificial neural network to obtain robust classification result. In
Bayesian Neural Network Classifier (BNNC), the weight decay
parameter can be adjusted automatically to obtain the optimal solution
during training. The whole data can be used for training without any
need of separate validation. The centroid value obtained from each
cluster of the input data is given to the BNNC for training. Let the
centroid input to the Bayesian classifier be represented by the equation
below.
)1(0; +≤< NiWi (5.13)
The general neural network contains three layers, namely, the
input layer, the hidden layer and the output layer. Initially, the centroid
obtained from each cluster is given as input to the Bayesian neural
network to select the prior probability distribution for model parameters.
Second is the fact that the prediction are made with respect to the
posterior parameter distribution obtained by updating of the prior
function. The Bayesian neural network is formed based on the above
two properties. Let the input be the vector of real centroid value Wi.
The output of each input centroid is trained by varying the weight at
168
each node to obtain the best classification result. The architecture of
the Bayesian neural network is shown in Fig. 5. 6.
Figure 5.6 Bayesian Neural Network Classifier (BNNC)
169
The output for the single hidden layer based Bayesian neural network
is computed based on the equation given below.
)))((()(1
0 xPWbVxyM
ikikk ∑
=
+= (5.14)
j
d
jijj xWbxPWhere ∑
=
+=1
tan)(,
Here, ijW is the weight on the connection from the input unit j to the
hidden unit i. Similarly, kiW is the weight on the connection from hidden
unit i to output unit k. Also, the biases of the hidden and output unit are
kb and jb . The activation function of the output layer is 0V . Further, to
avoid larger weights, a weight decay function is added to the data error
function De . Particularly, for classification problem, we have:
Wh
H
hhDe eJeT ∑
=
+=1 (5.15)
Where, eT is the total error function, hJ is a non-negative parameter for
the distribution of other parameters such as weights and biases. Here,
Whe is the weight error for the hth group of weights and biases and H is
the number of groups of weights and biases in the neural network.
Hereafter, the parameters such as weights and biases are grouped into
a single W dimensional weight vector w. According to the given weight
170
w, the posterior distribution of the given data D is represented by the
equation below.
)/()/(),/(),/(
µµµµ
DPwPwDPDwP =
(5.16)
Where, { }HJJJ ,......, 21=µ .
Also, the prior distribution of the weight is represented by the equation:
−= ∑
=Wh
H
hh
W
eJZ
wP1
exp)(
1)/(µ
µ (5.17)
Where, 2/
1
2)(hH W
h HW J
Z ∏=
=
πµ
The posterior density for the parameters is proportional to the product
of prior and hence, the training process is carried out for all clustering
centroid )1(0; +≤< NiWi . After training, the test data is given to our
Bayesians trained neural network to determine whether the output data
is attacked or not.
5.7 Summary
In this chapter the Hybrid Intrusion Detection Systemising
LDA+CS (Linear Discriminant Analysis + Cuckoo search) is developed
by combining LDA and CS. LDA is a commonly used technique for
dimensionality reduction. Fuzzy Bisector- Kernel Fuzzy C-means
171
clustering (FB-KFCM) is used as the clustering techniques and in this
proposed system; the Bayesian Neural Network is used for better
classification. The entire system will be applied to medical sensor
network to find the intrusion behaviour by simulating the networks in
JAVA. Finally, the performance of the system will be analysed using
KDD CUP 99 dataset in terms of accuracy.
172
CHAPTER 6
RESULTS AND IMPLEMENTATION
The proposed technique of Linear Discriminant Analysis + Cuckoo
search Fuzzy Bisector- Kernel Fuzzy C-means clustering (LDA-
CS+FB-KFCM+ Bayesian Network) is implemented using JAVA
PROGRAMMING on a system having 8GB RAM and 3.2 GHz
processor.. To evaluate the performance of the proposed technique,
we have used KDD CUP 99 DATASET for testing and evaluation. The
KDD CUP 99 dataset used here is a version of the original 1998
DARPA intrusion detection evaluation program. Also, it is one of the
publicly available data set that has actual attacks [142]. So, we have
used the dataset here to design and evaluate our intrusion detection
system.
The KDD CUP 1999 dataset used here was obtained from raw
TCP dump data for a length of nine weeks. The dataset is made of
large number of network traffic activities that include both normal and
malicious connections, which has five million connection records as
training data and two million as test data. Each instance has 41
features which are marked as normal or an attack. Totally 38 different
attacks are found in both training and testing data, which falls into four
173
main categories such as PROBE, denial of service (DOS), remote to
local(R2L) and user to root(U2R). [139,122].
The KDD Cup99 dataset are available in three different files
such as KDD Full Dataset that contains 4898431 instances, KDD Cup
10% dataset that contains 494021 instances and KDD Corrected
dataset that contains 311029 instances. In table 1, the details about
the KDD full and KDD 10% dataset are given. Table 1 explores the
number of samples present in each category before and after the
reduction of duplicate samples with percentage of reduction. Similarly,
Table 2 contains detail information on KDD Corrected and GureKDD
dataset along with before and after the reduction of redundancy
samples with percentage of reduction. The reduction of duplicate
samples is based upon algorithm 1. The Table 2 elaborates the forth
mentioned four attack category on KDD Cup 3 different datasets with
number of samples in each category and percentage of reduction after
applying algorithm 1. Each sample of the dataset represents a
connection between two network hosts according to network protocols.
It is described by 41 attributes, out of which 38 are continuous or
discrete numerical attributes and 3 are categorical attributes. Each
sample is labelled as either normal or one specific attack. The dataset
contains 23 class labels, out of which 1 is normal and remaining 22 are
174
different attacks. The KDD cup 99 dataset is huge in size, which offers
difficulty in performing the research. So, we have used a subset of 10%
of KDD cup 99 dataset for research.
Table 6.1 Attack Distribution in KDDfull, KDD 10% and KDD Corrected
dataset.
Dataset
DoS
U2R
R2L
Probe
Normal
Total
KDD Full 3883370 52 1126 41102 972781
4898431
KDD Full After removing duplicate Samples
247267 52 999 13860 812814
1074992
KDD 10% 391458 52 1126 4107 97278
494021
KDD10% After removing duplicate Samples
54598 52 999 2133 87832
145586
KDD Corrected 229269 70 16172 4925 60593
311029
KDD Corrected after removing
Duplicate samples
22984 70 2898 3426 47913 77291
175
Table 6.2: Accuracy for 8:2
Case 8:2 KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
LDA+CS+FB-KFCM+
Bayesian Network
Cluster size=200 93.2321 96.5506 97.4163
Cluster size=180 90.3874 93.4678 97.4003
Cluster size=160 90.3210 92.4013 97.2720
Cluster size=140 92.3542 93.4678 97.4303
Table 6.3: Accuracy for 7:3
Case 7:3 KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
LDA+CS+FB-KFCM+
Bayesian Network
Cluster size=200 92.2021 94.4124 98.4653
Cluster size=180 94.0824 96.5563 98.4135
Cluster size=160 94.0210 96.7341 98.3765
Cluster size=140 90.4201 92.4017 97.9872
176
Table 6.4: Accuracy for 9:1
Case 9:1 KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
LDA+CS+FB-KFCM+
Bayesian Network
Cluster size=200 92.8732 93.0023 99.3074
Cluster size=180 92.1532 93.4022 99.3155
Cluster size=160 90.9710 91.9360 99.3015
Cluster size=140 91.7342 92.6015 99.0612
86
88
90
92
94
96
98
Cluster size=200
Cluster size=180
Cluster size=160
Cluster size=140
KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
LDA+CS+FB-KFCM+ Bayesian Network
Figure 6.1 Accuracy Plot for Case 8:2
177
86
88
90
92
94
96
98
100
Cluster size=200
Cluster size=180
Cluster size=160
Cluster size=140
KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
LDA+CS+FB-KFCM+ Bayesian Network
Figure 6.2 Accuracy Plot for Case 7:3
86
88
90
92
94
96
98
100
Cluster size=200
Cluster size=180
Cluster size=160
Cluster size=140
KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
LDA+CS+FB-KFCM+ Bayesian Network
Figure 6.3 Accuracy Plot for Case 9:1
178
Table 6.5: Average Accuracy Table
Case KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
LDA+CS+FB-KFCM+
Bayesian Network
Case 8:2 91.5737 93.9719 97.3797
Case 7:3 92.6814 95.0261 98.3106
Case 9:1 91.9329 92.7355 99.2464
86
88
90
92
94
96
98
100
Case 8:2 Case 7:3 Case 9:1
KFCM + Bayesian Network
BF-KFCM+ Bayesian Network
LDA+CS+FB-KFCM+ Bayesian Network
Figure 6.4: Average Accuracy Plot
179
6.1 Comparative Analysis
Comparison of the existing technique such as KFCM+ Bayesian
network and Fuzzy Bisector-Kernel Fuzzy C-means clustering (FB-
KFCM)+ Bayesian network are compared along with the proposed
hybrid technique LDA+CS+ FB-KFCM+ Bayesian Network their results
are discussed. In Table 6.2 and Figure 6.1 gives the accuracy values
and Plot for Case 8:2 for various cluster size, Table 6.3 and Figure 6.2
gives the accuracy values and Plot for Case 7:3 for various cluster
size, Table 6.4 and Figure 6.3 gives the accuracy values and Plot for
Case 9.1 for various cluster size. Accuracy values are taken for
different cluster sizes of 140,160,180 and 200.In all cases the
proposed technique has achieved better accuracy value when
compared with existing technique.
Average accuracy value in Case 8:2 for existing technique
KFCM+ Bayesian network and Fuzzy Bisector-Kernel Fuzzy C-means
clustering (FB-KFCM)+ Bayesian network are 91.57% and 93.97%
respectively and for the proposed hybrid technique LDA+CS+ FB-
KFCM+ Bayesian Network the accuracy is 97.38%.
180
Average accuracy value in Case 7:3 for existing technique
KFCM+ Bayesian network and Fuzzy Bisector-Kernel Fuzzy C-means
clustering (FB-KFCM)+ Bayesian network are 92.68% and 95.03%
respectively and for the proposed hybrid technique LDA+CS+ FB-
KFCM+ Bayesian Network the accuracy is 98.31%.
Average accuracy value in Case 9:1 for existing technique
KFCM+ Bayesian network and Fuzzy Bisector-Kernel Fuzzy C-means
clustering (FB-KFCM)+ Bayesian network are 91.93% and 92.74%
respectively and for the proposed hybrid technique LDA+CS+ FB-
KFCM+ Bayesian Network the accuracy is 99.25%.
According to the results in table 6.5 and Figure 6.4 the hybrid
technique LDA+CS + FB-KFCM+ Bayesian Network attained high
accuracy of 98.31%. These values show the efficiency of the proposed
technique by achieving better accuracy values.
6.2 Implementation in Medical Sensor Network
The proposed intrusion detection system is applied to medical
sensor network in order to detect which of the data are intruded and
not intruded. Finally, the proposed algorithm is stimulated using
medical sensor networks that consist of totally 8668 data. The whole
data of the medical sensor network is trained using Bayesian neural
181
network in our algorithm. After training process, we have used 10 data
for testing at each time. In this testing stage, our algorithm will detect
which of the data were intruded and not intruded among the 10 data.
Here, at time T1 we have used 10 nodes for testing and the simulation
result obtained using our method is shown in Fig. 6.5.
T1 T2
T3 T4
Figure 6.5. Simulation Result obtained for time T1, T2, T3 and T4
182
In the simulated result, two colors such as red and green was
obtained that indicate the data type. The red colour in the result
indicates the intruded data and the green colour in the result indicate
the not intruded data. Among the 10 data given for testing time T1, 6
are not intruded data indicated by green and the remaining four are
intruded data indicated by red colour. Similarly in time T2, another 10
data is given for testing in our algorithm and the simulation result is
obtained. From the simulation result obtained at time T2, we have
found that 7 of the data among 10 are not intruded and the remaining 3
are intruded.
Again at time T3, the simulation result is obtained for 10 test
data. From the result, we have found that 8 among 10 are not intruded
and remaining 2 are intruded. Further at time T4, the 8 data among 10
are not intruded and remaining 2 are intruded, while testing 10 data in
our proposed intrusion detection algorithm.
6.3 Summary
In this Chapter Comparison of the existing technique such as
KFCM+ Bayesian network and Fuzzy Bisector-Kernel Fuzzy C-means
clustering (FB-KFCM)+ Bayesian network are compared along with the
proposed hybrid technique LDA+CS+ FB-KFCM+ Bayesian Network
their results are discussed. To evaluate the performance of the
183
proposed technique, we used KDD CUP 99 DATASET for testing and
evaluation. Based on the comparative analysis the proposed hybrid
technique LDA+CS+ FB-KFCM+ Bayesian Network attained high
accuracy of 98.31%. These values show the efficiency of the proposed
technique by achieving better accuracy values. Finally, the proposed
algorithm is stimulated using medical sensor networks that consist of
totally 8668 data. The simulation result is obtained for 10 test data.
From the result, we have found that 8 among 10 are not intruded and
remaining 2 are intruded. This attains high accuracy rate.
184
CHAPTER 7
CONCLUSION
In this intrusion detection system, an LDA+CS (Linear
Discriminant Analysis + Cuckoo search) is developed by combining
LDA and CS. The LDA+CS technique is used in this work for
dimensionality reduction and optimal feature selection. Some of the
clustering methods used previously are not suitable for large datasets.
So, we have proposed a new method for effective clustering by
incorporating Fuzzy Bisector with Kernel Fuzzy C-means clustering
called as FB-KFCM here. Further, the feature reduced dataset is
grouped into clusters using Fuzzy Bisector- Kernel Fuzzy C-means
clustering (FB-KFCM) method. Then, in the classification step, the
centroids from the clusters were taken and trained using the Bayesian
Neural Network. Bayesian neural network is the improved version of
artificial neural network to obtain robust classification result. For the
online identification of intrusion detection node, test data is given to the
trained network and tested for obtaining which of the given data is
intruded or not. The entire system is applied to medical sensor network
to find the intrusion behavior by simulating the networks in JAVA using
KDD CUP 99 dataset. The evaluation metric utilized is the accuracy
and the comparative analysis is made against the other techniques.
185
Average accuracy value was found to be 98.31, which was better than
the other compared techniques. The high accuracy value shows the
efficiency of the proposed technique.
7.1 Contributions
The contributions in this work are summarized as follows:
1. In this work, different variants of intrusion detection techniques like
Anomaly based intrusion detection, Signature based intrusion
detection, Host based intrusion detection, Network based intrusion
detection and hybrid intrusion detection for improving performance
in Medical Sensor network are studied and analyzed.
2. In this work existing clustering techniques such as K Means
Clustering, Fuzzy K Means Clustering, Fuzzy C-Means and
KFCM are discussed and also proposed Fuzzy Bisector- Kernel
Fuzzy C-means clustering (FB-KFCM) are designed and
developed.
3. Based on the analysis it is observed that the proposed Fuzzy
Bisector-Kernel Fuzzy C-means clustering (FB-KFCM) performs
better than other methods in terms of accuracy which attains an
average high accuracy of 93.91% when compared with other
techniques such as KFCM and KFCM with Bayesian Network.
186
4. The Hybrid Intrusion Detection System using LDA+CS (Linear
Discriminant Analysis + Cuckoo search) is
developed by combining LDA and CS. LDA is a commonly
used technique for dimensionality reduction. Fuzzy Bisector-
Kernel Fuzzy C-means clustering (FB-KFCM) is used as the
clustering techniques and in this proposed system; the Bayesian
Neural Network is used for better classification.
5. To evaluate the performance of the proposed technique, we used
KDD CUP 99 DATASET for testing and evaluation. Based on the
comparative analysis the proposed hybr id technique LDA+CS
+ FB-KFCM+ Bayesian Network attained high accuracy of 98.31%.
These values show the efficiency of the proposed technique by
achieving better accuracy values.
6. Finally, t h e p ropo se d a lgo r i t hm i s s t imu la ted us ing
med ica l sensor networks that consist of totally 8668 data. The
simulation result is obtained for 10 test data. From the result,
we have found that 8 among 10 are not intruded and
remaining 2 are intruded. This attains high accuracy rate.
187
7.2 Future Works
Following future works are proposed as continuation of the research
presented in this thesis
• In future, it is possible to provide extensions or modifications to the
proposed clustering and classification algorithms using intelligent
agents to achieve further increased performance. Apart from the
experimented combination of data mining techniques, further
combinations such as artificial intelligence, soft computing and other
clustering algorithms can be used to improve the detection accuracy
and to reduce the rate of false negative alarm and false positive
alarm. Finally, the intrusion detection system can be extended as an
intrusion prevention system to enhance the performance of the
system.
• The research in intrusion detection and the application of data
mining, and machine learning plays an important role in the security
of current and future computer networks. This thesis has explored
the feasibility of using supervised and unsupervised learning in the
classification of intrusion-detection attacks, and opens multiple
possibilities for future exploration and research, which may lead to
the design and the development of more efficient, reliable and
effective in detection, and preventive IDS systems.
188
REFERENCES
1. Adebayo O. Adetunmbi, Samuel O. Falaki, Olumide S. Adewale and
Boniface K. Alese, ―Network Intrusion Detection Based On Rough Set
And K-Nearest Neighbour, International Journal of Computing and ICT
Research, Vol. 2, No. 1, pp. 60 – 66, 2008.
2. AbhijitSarmah, “Intrusion Detection Systems: Definition, Need and
Challenges”, White Paper from SANS Institute, 2001.
3. Adeyinka, O.,(2008), “Internet Attack Methods and Internet Security
Technology Modeling & Simulation” , AICMS 08. Second Asia
International Conference on,vol., no., pp.77 ‐82.
4. Agrawal R and R. Srikant,(1994)“Fast algorithms for mining association
rules”.
5. Indraneel Mukhopadhyay, Mohuya Chakraborty and Satyajit
Chakrabarti “A Comparative Study of Related Technologies of
IntrusionDetection & Prevention Systems” Journal of Information
Security, 2011, 2, 28-38.
6. Amini M. et.al. (2004), ‘Network-Based Intrusion Detection Using
Unsupervised Adaptive Resonance Theory (ART)’, Proceedings of the
4th Conference on Engineering of Intelligent Systems (EIS 2004),
Madeira, Portugal.
7. Amoroso E, Wykrywanieintruzów, Wydawnictwo RM, Warszawa 1999.
8. AnazidaZainal, MohdAizainiMaarof and Siti Maryam Shamsudin ,
“Research Issues in Adaptive Intrusion Detection”, In Proceedings of
the 2nd Postgraduate Annual Research Seminar (PARS'06), Faculty of
Computer Science & Information Systems, UniversitiTeknologi
Malaysia, 24 – 25 May, 2006.
9. Andonie. R and Kovalerchuk. B, Neural networks for data mining:
constrains and open problems
189
10. Anil Kumar K S and Dr. V. NandaMohan, " Novel Anomaly Intrusion
Detection Using Neuro-Fuzzy Inference System ", IJCSNS International
Journal 6 of Computer Science and Network Security, vol.8, no.8, pp.6-
11 , August 2008.
11. Axelsson S.: Intrusion Detection Systems: A Taxomomy and Survey.
Technical Report No 99-15, Dept. of Computer Engineering, Chalmers
University of Technology, Sweden, March 2000,
12. Bahrololum M, E. Salahi and M. Khaleghi “Anomaly intrusion detection
design using hybrid of unsupervised and supervised neural networks”,
International Journal of Computer Networks & Communications, Vol.1,
No.2, 2009.
13. Barbara, D., N. Wu, and S. Jajodia, Detecting novel network intrusions
using Bayes estimators, In Proc. of the First SIAM Int. Conf. on Data
Mining (SDM 2001), Chicago, Society for Industrial and Applied
Mathematics (SIAM), 2001
14. Bass T.: Intrusion Detection Systems Multisensor Data Fusion: Creating
Cyberspace Situational Awareness. Communication of the ACM, Vol.
43,Number 1, January 2000, pp. 99-105,
15. Mohammad Khubeb Siddiqui and Shams Naahid,” Analysis of KDD
CUP 99 Dataset using Clustering based Data Mining” International
Journal of Database Theory and Application Vol.6, No.5 (2013), pp.23-
34
16. Bezdek J C, Pattern Recognition with fuzzy objective function
algorithms, Newyork: Plenum, 1981.
17. BhavyaDaya , (2010), “Network Security: History, Importance, and
Future”, University of Florida Department of Electrical and Computer
Engineering.
18. BOLEY, D.L. 1998. Principal direction divisive partitioning. Data Mining
and Knowledge Discovery, 2, 4, 325-344.
190
19. Cabrera, J.B.D., Ravichandran, B &Mehra R.K. (2000). Statistical Traffic
Modelling for Network Intrusion Detection. In Proceeding of the IEEE
Conference
20. Hayoung Oh, Inshil Doh, Kijoon Chae, “Attack classification based on
data mining technique and Its application for reliable medical sensor
communication”, International Journal of Computer Science and
Applications Te chnoma
pp 20 – 32, 2009
21. Cannady J, “Artificial Neural Networks for Misuse Detection”, In
Proceedings of the ’98 National Information System Security
Conference (NISSC’98), pp. 443-456, 1998.
22. Carbone, P. L., Data mining or knowledge discovery in databases: An
overview. In Data Management Handbook. New York: Auerbach
Publications, 1997.
23. Chan, P. K., M. V. Mahoney, and M. H. Arshad, Managing Cyber
Threats: Issues, Approaches and Challenges, Chapter Learning Rules
and Clusters for Anomaly Detection in Network Traffic, Kluwer, 2003.
24. Chimphlee W. et al. (2006), ‘Anomaly-Based Intrusion Detection using
Fuzzy Rough Clustering’, proceedings of the International Conference
on Hybrid Information Technology , Vol. 1, pp. 329-334.
25. Crosbie, M. and E. H. Spafford, Active defense of a computer system
using autonomous agents. Technical Report CSD-TR-95-008, Purdue
Univ., West Lafayette, IN, February, 1995.
26. Cuppen, F. &Miege, A. (2002). Alert Correlation in a Cooperative
Intrusion Detection Framewok. In Proceeding of the 2002 IEEE
Symposium on Security and Privacy. IEEE, 2002]
27. Daniel Barbara, Julia C., (2001),“ADAM: Detecting Intrusions by Data
Mining” , Proceedings of the 2001 IEEE Workshop on Information
Assurance and Security United States Military Academy, West Point,
NY, 5.
191
28. DARPA Intrusion Detection Evaluation Data Set” from
http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/19
98data.html
29. Dasarathy B V, “Intrusion Detection”, Information Fusion, Vol.4, No.4,
pp.243-245, 2003.
30. Dasgupta, D. and F. A. Gonz´alez, An intelligent decision support
system for intrusion detection and response. In Proc. of International
Workshop on Mathematical Methods, Models and Architectures for
Computer Networks Security (MMM-ACNS), St.Petersburg. Springer-
Verlag, 21-23 May, 2001.
31. Dash M. and Liu H.,(1997), “Feature selection for classification”,
Intelligent Data Analysis: An International Journal, PP. 131–156.
32. Debar H., Dacier M., Wespi A.: Towards a taxonomy of intrusion-
detection systems. Computer Networks, 31, 1999, pp. 805-822.
33. Dewan Md. Farid and Mohammad Zahidur Rahman, “Anomaly Network
Intrusion Detection Based on Improved Self Adaptive Bayesian
Algorithm”, Journal of Computers, Vol.5, No.1, January, 2010.
34. Didaci, L., G. Giacinto, and F. Roli, Ensemble learning for intrusion
detection in computer networks. http://citeseer.nj.nec.com/533620. html,
2002.
35. Disha Sharma, Fuzzy Clustering as an Intrusion Detection Technique,
International Journal of Computer Science & Communication Networks,
Vol.1, No.1,2011.
36. Dorosz P., Kazienko P. Systems wykrywaniaintruzów. VI Krajowa
Konferencja Zastosowan Kryptografii ENIGMA 2002, Warsaw 14-17
May 2002 , p. TIV 47-78, (In Polish only)
37. Dorothy E. Denning. An intrusion detection model. IEEE Transactions
on Software Engineering, SE-13(2):222–232, 1987.
192
38. Ektefa M, S. Memar, F. Sidi and L.S. Affendey, "Intrusion detection
using data mining techniques", In proceedings of International
Conference on Information Retrieval & Knowledge Management,
(CAMP), pp. 200-203, 2010.
39. Ellen Pitt and RichiNayak, (2007),“The Use of Various Data Mining and
Feature Selection Methods in the Analysis of a Population Survey
Dataset”, Conferences in Research and Practice in Information
Technology.
40. Eskin E. et.al. (2000), ‘Adaptive Model Generation for Intrusion
Detection Systems’, Proceedings of the 7th ACM Conference on
Computer Security, Athens, Greece.
41. Eskin E. et.al. (2002), ‘A Geometric Framework for Unsupervised
Anomaly Detection: Detecting Intrusions in Unlabeled Data’, Data
Mining for Security Applications, Kluwer, Academic publishers, 2002.
42. Eskin, E., Anomaly detection over noisy data using learned probability
distributions. In Proc. 17th International Conf. on Machine Learning,
San Francisco, pp. 255–262, Morgan Kaufmann, 2000.
43. Faizal, M.A., MohdZaki M., Shahrin Sahib, Robiah, Y., Siti Rahayu, S.,
and Asrul Hadi, Y. “Time Based Intrusion Detection on Fast Attack for
Network Intrusion Detection System”, Second International Conference
on Network Applications, Protocols and Services, IEEE, 2010.
44. Fan W., Miller M., Stolfo S., Lee W., Chan P.: Using Artificial Anomalies
to Detect Unknown and Known Network Intrusions. In Proceedings of
the First IEEE International Conference on Data Mining, San Jose, CA,
November 2001,
45. Farah J., Mantaceur Z. & Mohamed BA. (2007). A Framework for an
Adaptive Intrusion Detection System using Bayesion Network.
Proceeding of the Intelligence and Security Informatics, IEEE, 2007.
46. Farid Dewan Md. and Rahman Mohammad Zahidur , “Anomaly Network
Intrusion Detection Based on Improved Self Adaptive Bayesian
Algorithm”, Journal of Computers, Vol.5, No.1, January, 2010.
193
47. Fengmin Gong, “Deciphering Detection Techniques: Part II Anomaly-
Based Intrusion Detection”, White Paper from McAfee Network Security
Technologies Group, 2003.
48. Frederick K. K.: Network Intrusion Detection Signatures. December 19,
2001, http://online.securityfocus.com/infocus/1524.
49. Gang Wang, Jinxing Hao, Jian Ma and Lihua Huang, ―A new approach
to intrusion detection using Artificial Neural Networks and fuzzy
clustering, Expert System with Applications, Vol.37, No.9, pp.6225–
6232, 2010.
50. Garuba, M., Liu, C. & Fraites, D. (2008). Intrusion Techniques:
Comparative Study of Network Intrusion Detection Systems. In
Proceeding of Fifth International Conference on Information
Technology: New Generation, IEEE, 2008.
51. Gomez et al (2002), ‘Evolving Fuzzy Classifiers for Intrusion Detection’,
Proceedings of the 2002 IEEE Workshop on Information Assurance
United States Military Academy, West Point, NY June 2001.
52. Gong F, “Deciphering Detection Techniques: Part II Anomaly-Based
Intrusion Detection”, White Paper from McAfee Network Security
Technologies Group, 2003.
53. Gowrisona G, K. Ramarb, K. Muneeswaranc, T. Revathic, " Minimal
complexity attack classification intrusion detection system", Applied Soft
Computing, Vol 13, pp: 921–927, 2013.
54. Nancy,Jasdeep kaur,Rameet Kaur ,Nishu ,”Datamining-a review and
Description”,International journal on recent and innovation on trends in
computing and communication, Vol:1Issue:7,pp:582-586,2013
55. Hafiz Muhammad Imran, Azween Bin Abdullah, Muhammad Hussain,
Sellappan Palaniappan and Iftikhar Ahmad, Intrusions Detection based
on Optimum Features Subset and Efficient Dataset Selection,
International Journal of Engineering and Innovative Technology (IJEIT),
Vol.2, No. 6, 2012.
194
56. Harley Kozushko, “Intrusion Detection: Host-Based and Network-Based
Intrusion Detection Systems”, White Paper from Independent Study,
September 11, 2003.
57. Hazem M. El-Bakry, "Automatic Human Face Recognition Using
Modular Neural Networks," Machine Graphics & Vision Journal (MG&V),
vol. 10, no. 1, 2001, pp. 47-73.
58. Hazem M. El-Bakry, and Nikos Mastorakis “Fast Detection of Specific
Information in Voice Signal over Internet Protocol,” Proc. of 7th WSEAS
Int. Conf. on COMPUTATIONAL INTELLIGENCE, MAN-MACHINE
SYSTEMS and CYBERNETICS (CIMMACS '08), Cairo, EGYPT, Dec.
29-31, 2008, pp. 125-136.
59. Hazem M. El-Bakry, Nikos E. Mastorakis, Michael E. Fafalios, “Fast
Information Retrieval from Big Data by using Cross Correlation in the
Frequency Domain,” Proc. of IEEE IJCNN 2013, Dallas Tx, USA,
August 4-9, 2013, pp. 366-272.
60. Helmer, G., J. Wong, V. Honavar, and L. Miller, Automated discovery of
concise predictive rules for intrusion detection. Technical Report 99-01,
Iowa State Univ., Ames, IA, January, 1999.
61. Hershkop S., Apap F., Eli G., Tania D., Eskin E., Stolfo S., (2007),“A
data mining approach to host based intrusion detection” , Technical
reports, CUCS Technical Report.
62. Introduction to Data mining and knowledge discovery, two crows
corporations, 2005.
63. Intrusion Detection Systems (IDS). Group Test (Edition 3), NSS Group,
July 2002, http://www.nss.co.uk/ids/edition3/index.htm.
64. IoannisKrontiris, ZinaidaBenenson, ThanassisGiannetsos, Felix C.
Freiling and TassosDimitriou, ―Cooperative Intrusion Detection in
Wireless Sensor Networks, Lecture Notes in Computer Science, Vol.
5432, pp 263-278, 2009.
65. Ion IANCU and Mihai GABROVEANU (2010)"Fuzzy Logic Controller
Based on Association Rules".
195
66. Irvine (1999), ‘KDD Cup 1999 Data’, 5th International Conference on
Knowledge Discovery and Data Mining, http:// kdd.ics.uci.edu/
databases/kddcup99/ kddcup99.html.
67. ITA, The internet traffic archive, 2000, http://ita.ee.lbl.gov/.
68. James P. Anderson. Computer security threat monitoring and
surveillance. Technical report, James P. Anderson Co., 1980.
69. Javitz, H. S. and A. Valdes, The NIDES statistical component:
Description and justification, Technical report, SRI International,
March,1993
70. Jian Pei , Jiawei Han , Laks V. S. Lakshmanan, “Pushing Convertible
Constraints In Frequent Itemset Mining”, Data Mining And Knowledge
Discovery, Vol. 8, No.3, pp.227-252, May 2004.
71. Jiawei Han And Micheline Kamber,(2008), “Data mining concepts and
techniques” , Morgan Kaufmann publishers .an imprint of Elsevier .ISBN
978-1-55860-901-3. Indian reprint ISBN 978-81-312- 0535-8 .
72. John Wack, Ken Cutler, Jamie Pole,(2002), “Guidelines on Firewalls
and Firewall Policy ” ,Recommendations of the National Institute of
Standards and Technology.
73. Jones A.K., Sielken R.S.: Computer system intrusion
detection:survey.09.02.2000,IDSresearch/Documents/jones-sielken-
survey-v11.pdf.
74. Joseph T and H. T. Nguyen, "Neural network control of wheelchairs
using telemetric head movement," Proceedings of the 20th Annual
International Conference of the IEEE, Engineering in Medicine and
Biology Society, vol. 5, pp. 2731 - 2733, 1998.
75. Joshua W. Haines et.al. (2001), ‘Extending the DARPA Off-Line
Intrusion Detection Evaluations’, Proceedings of IEEE DARPA
Information Survivability Conference and Exposition II, Vol. 1, 77-88.
76. Joyce Jackson,(2002), “Data Mining: A Conceptual Overview”
Communications of the Association for Information Systems ,Vol 8.
196
77. Karen S. and Peter M.,(2007), “Guide to Intrusion Detection and
Prevention Systems”, National Institute of Standards and Technology,
Department of Commerce, USA.
78. Karl Levitt. (2002). Intrusion Detection: Current Capabilities and Future
Direction. Proceeding of IEEE Conference of the 18th Annual Computer
Security Application, IEEE, 2002.
79. Karthik G and Nagappan A, ―Intrusion Detection System Using Kernel
FCM Clustering and Bayesian Neural Network, International Journal of
Computer Science and Information Technology & Security (IJCSITS),
Vol. 3, No.6, 2013.
80. Kayacik, G. H., Zincir-Heywood, A. N.,(2005), “Analysis of Three
Intrusion Detection System Benchmark Datasets Using Machine
Learning Algorithms” , Detection System Benchmark Datasets Using
Machine Learning Algorithms” ,
81. KDDCup1999Data”from
http://www.sigkdd.org/kddcup/index.php?section=1999&method=data
82. KdNuggets,(2007), “Data Mining Methodology”,
http://www.kdnuggets.com/polls/2007/ datamining_methodology.htm,
83. Keim, Daniel A.(2002)" Information Visualization and Visual Data
Mining".
84. Kendall, K.,(1999) “ A database of computer attacks for the evaluation
of intrusion detection systems” , Masters thesis, Massachusetts Institute
of Technology.
85. Kumar G, K. Kumar and M. Sachdeva, (2010),s“ The Use of Artificial
Intelligence based Techniques For Intrusion Detection – A Review,
Artificial Intelligence Review” vol. 34, No. 4, pp. 369-387, Springer,
Netherlands, DOI: 10.1007/s10462-010-9179-5 ISSN: 0269-2821.
86. L. O and N. M, “Ordered estimation of missing values,” in PAKDD,
1999, pp. 499–503.
197
87. Latifur Khan, MamounAwad, BhavaniThuraisingham, “A new intrusion
detection system using support vector machines and hierarchical
clustering”, The International Journal on Very Large Data Bases, Vol.
16, no. 4, October 2007.
88. Lee W, S. Stolfo, and K. Mok, “A Data Mining Framework for Building
Intrusion Detection Model”, In Proceedings of the IEEE Symposium on
Security and Privacy, Oakland, CA: IEEE Computer Society Press, pp.
120-132, 1999.
89. Lee W. (1994-1999), ‘Dissertation: A Data Mining Framework for
Constructing Features and Models for Intrusion Detection Systems’,
Ph.D thesis, Columbia University, New York, NY.
90. Lee W. et.al. (2001), ‘Real Time Data Mining-based Intrusion Detection’,
Proceedings of the Second (DARPA) Information Survivability
Conference and Exposition, pp. 85-100.
91. Lee W., Stolfo.S.J. (1998), ‘Data mining approaches for intrusion
detection’, Proceedings of the 7th USENIX Security Symposium, pp. 79-
94, Texas.
92. Lee, W. and S. J. Stolfo, A framework for constructing features and
models for intrusion detection systems. Information and System
Security 3 (4), 227–261, 2000.
93. Li Tian and Wang Jianwen, Research on Network Intrusion Detection
System Based on Improved K-means Clustering Algorithm, International
Forum on Computer Science-Technology and Applications, pp.76 – 79,
2009.
94. Lippmann R.P. et.al. (2000), ‘Evaluating Intrusion Detection Systems:
The 1998 DARPA Off-line Intrusion Detection Evaluation’, Proceedings
of the 2000 DARPA Information Survivability Conference and Exposition
(DISCEX), pp. 12-26, Los Alamitos.
198
95. Lippmann, R. P., D. J. Fried, I. Graf, J. W. Haines, K. Kendall,
D.McClung, D. Weber, S. Webster, D. Wyschogrod, R. K. Cunningham,
and M. Zissman, Evaluating intrusion detection systems: The 1998
DARPA off-line intrusion detection evaluation. In Proc. of the DARPA
Information Survivability Conference and Exposition, Los Alamitos, CA.
IEEE Computer Society Press, January, 2000.
96. Luo J, and S. M. Bridges, “Mining fuzzy association rules and fuzzy
frequency episodes for intrusion detection”, International Journal of
Intelligent Systems, Vol. 15, No. 8, pp. 687-704,2000.
97. Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu and Ali A. Ghorbani, "A
detailed analysis of the KDD CUP 99 data set", in Proceedings of the
Second IEEE international conference on Computational intelligence for
security and defense applications, pp. 53-58, Ottawa, Ontario, Canada,
2009.
98. Marcos M. Campos, Boriana L. Milenova, “Creation and Deployment of
Data Mining-Based Intrusion Detection Systems in Oracle Database
10g”, In Proceedings of the Fourth International Conference on Machine
Learning and Applications, 2005.
99. Marin, G.A.,(2005), "Network security basics Security & Privacy” ,, IEEE
, vol.3, no.6, pp. 68 ‐72.
100. Matteucci M, "A tutorial on clustering algorithms,"
http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/cmeans.html
101. McClure, S., InfoWorld Security Suite 16 debuts. http://www.idg,
net/crd_detection_16738.html, 1998
102. Michael J. Pazzani , (2000), “Knowledge discovery from DATA?”, IEEE
Intelligent Systems.
103. MIT Lincoln Laboratory (1999), ‘DARPA Intrusion Detection Evaluation’,
http://www.ll.mit.edu/IST/ideval/data/data_index.html.
104. Mohammadreza Ektefa , Sara Memar, Fatimah Sidi, Lilly
SurianiAffendey.,(2010),“Intrusion Detection Using Data Mining
Techniques ” , 978-1-4244-5651-2/10/2010.
199
105. Mohanabharathi R, T.Kalaikumaran and S.Karthi, ―Feature Selection
for Wireless Intrusion Detection System Using Filter and Wrapper
Model,International Journal of Modern Engineering Research (IJMER),
Vol.2, No.4, pp-1552-1556, 2012.
106. Nada Lavrac, BlažZupan (2005)"Data Mining in Medicine"in Data Mining
and Knowledge Discovery Handbook.
107. National Research Council (2008)"Protecting Individual Privacy in the
Struggle Against Terrorists.
108. Neri, F., Comparing local search with respect to genetic evolution to
detect intrusion in computer networks. In Proc. of the 2000 Congress on
Evolutionary Computation CEC00, La Jolla, CA, pp. 238–243. IEEE the
North American Fuzzy Information Processing Society, Atlanta, pp.
301–306. North American Fuzzy Information Processing Society
(NAFIPS), July, 2000. Press, 16-19 July, 2000.
109. Nguyen H T, L.M. King and G. Knight, ―Real-time head-movement
system and embedded Linux implementation for the control of power
wheelchair, Proceedings of the 26th Annual International Conference of
the IEEE Engineering in Medicine and Biology Society, pp. 4892-4895,
2004.
110. Nivedita Naidu and Dr.R.V.Dharaskar, “An Effective Approach to
Network Intrusion Detection System using Genetic Algorithm”,
International Journal of Computer Applications, Vol.1, No.3, pp.26–32,
February 2010.
111. Noel, S., Wijesekera, D., and Youman, C., “Modern Intrusion Detection,
Data Mining, and Degrees of Attack Guilt”, Applications of Data Mining
in Computer Security, Kluwer Academic Publishers, pp. 2-25, 2002.
112. Novikov D. et.al. (2006), ‘Anomaly Detection Based Intrusion Detection’
Proceedings of the Third IEEE International Conference on Information
Technology: New Generations (ITNG'06), pp. 420-425.
200
113. Oh et al (2009), ‘Attack classification based on data mining technique
and its application for reliable medical sensor communication,
International Journal of Computer Science and Applications, Techno
mathematics Research Foundation, Vol. 6, No. 3, pp 20 – 32.
114. Pachet, François; Westermann, Gert; and Laigre,
Damien(2001)"Musical Data Mining for Electronic Music Distribution".
115. Peter Lichodzijewski, A. NurZincir-Heywood and Malcolm I. Heywood,
―Host-Based Intrusion Detection Using Self-Organizing Maps,Fac. of
Comput. Sci.
116. Phillip A. Porras and Alfonso Valdes. Live tra_c analysis of tcp/ip
gateways. In Proceedings of the 1998 ISOC Symposium on Network
and Distributed System Security (NDSS'98), San Diego, CA, March
1998. Internet Society.
117. Ptacek, T. H. and T. N. Newsham, Insertion, evasion and denial of
service: Eluding network intrusion detection, Technical report, Secure
Networks, Inc., January, 1998.
118. Rasha g. Mohammed Helali, "data mining based network intrusion
detection system: a survey", novel algorithms and techniques in
telecommunications and networking, pp. 501-505, 2010.
119. Richard Heady, George Luger, Arthur Maccabe, and Mark Servilla. The
architecture of a network level intrusion detection system. Technical
report, University of New Mexico, 1990.
120. Rupali Datti and Bhupen draverma, Feature Reduction for Intrusion
Detection Using Linear Discriminant Analysis, International Journal on
Computer Science and Engineering, Vol. 02, No. 04,pp.1072-1078,
2010.
121. Sandra Liewis, Liangxiu Han and John A. Keane(2013)"Understanding
Low Back Pain using Fuzzy Association Rule Mining".
122. Santosh Kumar Sahu Sauravranjan Sarangi and Sanjaya Kumar Jena,
“A Detail Analysis on Intrusion Detection Datasets”, International
Advance Computing Conference, pp.1348 – 1353, 2014.
201
123. Sarab M. Hameed, Sumaya Saad, and Mayyadah F. AlAni, ―An
Extended Modified Fuzzy Possibilistic C-Means Clustering Algorithm for
Intrusion Detection, Lecture Notes on Software Engineering, Vol. 1, No.
3, 2013.
124. SAVARESI, S. and BOLEY, D. 2001. On performance of bisecting k-
means and PDDP. In Proceedings of the 1st SIAM ICDM, Chicago, IL.
125. Sekar, R., Gupta, A., Frullo, J., Shanbhag, T., Tiwari, A., Yang, H. &
Zhou, S. (2002). Specification-based Anomaly Detection: A New
Approach for Detecting Network Intrusions. In Proceeding of CCS ACM
Conference.
126. Shailendra Singh and Sanjay Silakari, ―Generalized Discriminant
Analysis algorithm for feature reduction in Cyber Attack Detection
System, (IJCSIS) International Journal of Computer Science and
Information Security, Vol. 6, No. 1, 2009.
127. Shailendra Singh, Sanjay Silakari and Ravindra Patel, ―An efficient
feature reduction technique for intrusion detection system, International
Conference on Machine Learning and Computing ,vol.3,2011.
128. Shekhar R. Gaddam, Vir V. Phoha, Kiran S. Balagani, “K-Means+ID3: A
Novel Method for Supervised Anomaly Detection by Cascading K-
Means Clustering and ID3 Decision Tree Learning Methods”, IEEE
Transactions on Knowledge and Data Engineering, Vol. 19, No. 3, pp.
345-354, 2007.
129. Shingo Mabu, Nannan Lu, Kaoru Shimada, Kotaro Hirasawa, " An
Intrusion-Detection Model Based on Fuzzy Class-Association-Rule
Mining Using Genetic Network Programming", IEEE Transactions On
Systems, Man, And Cybernetics—Part C: Applications And Reviews,
VOL. 41, NO. 1, PP: 130-139 , 2011.
130. Shon T, Seo J, and Moon J, “SVM Approach with A Genetic Algorithm
for Network Intrusion Detection”, Lecture Notes in Computer Science,
Springer Berlin / Heidelberg, Vol. 3733, pp. 224-233, 2005, ISBN 978-3-
540-29414-6.
202
131. Singh, S. and S. Kandula, Argus - a distributed network-intrusion
detection system. Undergraduate Thesis, Indian Institute of Technology,
May, 2001.
132. Snehal A. Mulay, P.R. Devale and G.V. Garje, Intrusion Detection
System using Support Vector Machine and Decision Tree, International
Journal of Computer Applications, Vol.3, No.3, pp. 0975 – 8887, 2010.
133. Son T. Nguyen, Hung T. Nguyen and Philip B. Taylor, ―Bayesian
Neural Network Classification of Head Movement Direction using
Various Advanced Optimisation Training Algorithms International
Conference on Biomedical Robotics and Biomechatronics, pp.1014-
1019, 2006.
134. Sri latha Chebrolu, Ajith Abraham and Johnson P Thomas, ―Hybrid
Feature Selection for Modelling Intrusion Detection Systems, Lecture
Notes in Computer Science,Vol.3316, pp 1020-1025, 2004.
135. STEINBACH, M., KARYPIS, G., and KUMAR, V. 2000. A comparison of
document clustering techniques. 6th ACM SIGKDD, World Text Mining
Conference, Boston, MA.
136. Sumathi M and Umarani R, ―Advanced Network Intrusion Detection
System Based on Effective Feature Selection, International Journal of
Computer Science and Information Technologies, Vol. 4, No.1,
pp. 107 – 112, 2013.
137. Satyanarayan Misra1, Prof. Sanjay Singh2 and Pradeep Kumar
Tiwari3,” Classification of Dataset Using Clustering Technique”
International Journal of Computer Science and Telecommunications
[Volume 3, Issue 4, April 2012]
138. Taylor P B and H. T. Nguyen, "Performance of a head-movement
interface for wheelchair control," Proceedings of the 25th Annual
International Conference of the IEEE Engineering in Medicine and
Biology Society, vol. 2, pp. 1590 - 1593, 2003.
203
139. Thomas G. Dietterich and Ghulum Bakiri, ” Solving Multiclass Learning
Problems via Error-Correcting Output Codes”, International Journal of
Artificial Intelligent research,Vol.2, pp.263-286,1995.
140. Thomas G. Dietterich and GhulumBakiri,Solving Multiclass Learning
Problems via Error-Correcting Output Codes, International Journal of
Artificial Intelligent research,Vol.2, pp.263-286,1995.
141. Tsai C F, Y. F. Hsu, C. Y. Lin and W. Y. Lin, (2009), “ Intrusion
detection by machine learning: A review” ,, Expert Systems with
Applications, Vol 36, Issue 10, pp. 11994-12000. 2009.
142. U .Aickelin, J Twycross and T HeskethRoberts, “Rule Generalization in
Intrusion Detection Systems Using SNORT”, International Journal of
Electronic Security and Digital Forensics, Vol.1, No.1, pp.101-116,
2007.
143. U. Fayyad, D. Haussler, and P. Stolorz.(1996), “From Data Mining to
Knowledge Discovery in Databases” , 0738-4602-1996.
144. Warrender, C., S. Forrest, and B. A. Pearlmutter, Detecting intrusions
using system calls: Alternative data models. In Proc. of the 1999 IEEE
Symp. on Security and Privacy, Oakland, CA, pp. 133–145. IEEE
Computer Society Press, 1999.
145. Wenke Lee and Salvatore J. Stolfo, “Data Mining Approaches for
Intrusion Detection”, Proceedings of the 7th USENIX Security
Symposium, San Antonio, Texas, January 26-29, 1998.
146. Whitman M. E. &Mattord H. J. ,(2007), “Principles of Information
Security” (2nd ed.), New Delhi: Thomson Learning/Course Technology.
147. Witten IH, Frank E. , (2005),“ Data Mining: Practical Machine Learning
Tools and Techniques” , Second edition, Morgan Kaufmann, 2005.
148. Wu Junqi1, Hu Zhengbing.,(2008), “ Study of Intrusion Detection
Systems (IDSs) in Network Security”., ISSN 978-1-4244-2108-4/08/,
2008 IEEE.
204
149. Yan et al (2009), ‘A Hybrid Intrusion Detection System of Cluster-based
Wireless Sensor Networks’, Proceedings of the International Multi
Conference of Engineers and Computer Scientists Vol. 1, March 18 -
20, 2009, Hong Kong.
150. Yao, J. T., S.L. Zhao, and L.V. Saxton, “A Study On Fuzzy Intrusion
Detection”, In Proceedings of the Data Mining, Intrusion Detection,
Information Assurance, And Data Networks Security, SPIE, Vol. 5812,
pp. 23-30 ,28 March - 1 April, Orlando, Florida, USA, 2005.
151. Yeophantong, T, Pakdeepinit, P., Moemeng, P & Daengdej, J.
(2005).Network Traffic Classification Using Dynamic State Classifier. In
Proceeding of IEEE Conference.
152. Yeung, D.Y., and C. Chow, Parzen-window network intrusion detectors.
In Proc. of the Sixteenth International Conference on Pattern
Recognition, Volume 4, Quebec City, Canada, pp. 385–388. IEEE
Computer Society, 11-22 August, 2002.
153. Yi Mao, Yixin Chen, Gregory Hackmann, Minmin Chen, Chenyang Lu,
Marin Kollef, Thomas C.Bailey(2011)"Medical Data Mining for Early
Deterioration Warning in General Hospital Wards".
154. Yu Y, and Huang Hao, “An Ensemble Approach to Intrusion Detection
Based on Improved Multi-Objective Genetic Algorithm”, Journal of
Software, Vol.18, No.6, pp.1369-1378, June 2007.
155. Zhiyuan Tan, ArunaJamdagni, Xiangjian He and Priyadarsi Nanda,
―Network Intrusion Detection Based on LDA for Payload Feature
Selection,IEEE GLOBECOM Workshop on Web and Pervasive
Security, pp. 1545-1549, 2010.
205
LIST OF PUBLICATIONS
International Journals
• Karthik G, Nagappan A “Intrusion Detection System Using
Kernel FCM Clustering and Bayesian Neural Network”,
International Journal of Computer Science and Information
Technology & Security (IJCSITS), Vol 3 Issue 6, Pages 391-399,
December 2013.
• Karthik G, Geetha T, Nagappan A “Development of Hybrid
Intrusion Detection System and Its Application to Medical
Sensor Network”, International Journal of Innovative Research in
Computer and Communication Engineering (IJIRCCE), Vol. 3
Issue 9, Pages 8182-8198, September 2015..
206