using dbscan clustering algorithm in detecting ddos attack · journal of babylon university/pure...
TRANSCRIPT
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4141
Using DBSCAN Clustering Algorithm in Detecting DDoS Attack
Safaa O. Al-Mamory
Assistant Professor, college of Information Technology, University of Babylon [email protected]
Zahraa Mohammed Ali
Department of computer science, University of Kufa
Abstract Distributed Denial of Service (DDoS) attack, has become one of the major threats to the Internet. It
makes a victim to deny providing normal services in the Internet by generate huge useless packets by a
large number of agents and can easily exhaust the computing and communication resources of a victim .In
this paper we develop method to detect DDoS attacks accurately and proactively. This can be achieved
using entropy concept to measure abnormal change in traffic according to the phases of the attack , and
then these traffics are clustered using DBSCAN algorithm. The patterns for DDoS traffic is created based
on extracted centroid points from each cluster, which are used in testing phase using Distances-based
classification . This system is characterized processing and analyzing of high-speed network traffic (based
on entropy approach ), discovering and accurately identifying new types of DDoS attack to reduce the false
alarms (FA) , detecting this attack in real time and making use of pattern in the train stage to increase
detection ratio.
Keywords : DDoS , Proactive detection , Clustering , DBSCAN
1.Introduction Distributed denial of service (DDoS) attack was first seen in early 1998
(CERT,1998). In February 2000, a number of the World’s largest e-commerce sites
included Yahoo.com, Amazon.com, Excite, E*Trade, eBay, CNN.com, Buy.com, and
ZDNet were brought offline for days by this kind of attack, even though they were
designed to offer high availability. The outages had caused a huge economic loss to both
the victim sites and their users (Wan, 2001).
The overarching aim of this paper is to develop method to detect DDoS attacks
accurately and proactively . This can be achieved using entropy concept to measure
abnormal change in traffic according to the phases of the attack , and then this traffic is
clustered using DBSCAN algorithm, and the pattern for DDoS traffic is created based on
the output cluster set. This system is characterized processing and analyzing of high-
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4141
speed network traffic (based on entropy approach), discovering and accurately
identifying new types of DDoS attack to reduce the false alarms (FA), detecting the
intrusion in real time and making use of pattern in the train stage to increase detection
ratio .
In Section 2 describes The related works. In Section3 proposed system is explan .
The experimental results are discussed in Section4.Conclusions are given in Section5.
2.Related Works There have been done lots of researches relevant to DDoS attack . To detect this
attack proactively , Feinstein et al. (2003) presented statistical approaches to identify
DDoS attacks by computing entropy and frequency-sorted distributions of selected packet
attributes. The DDoS attacks show anomalies in the characteristics of the selected packet
attributes. The detection accuracy and performance are analyzed using live traffic traces
from a variety of network environments ranging from points in the core of the Internet to
those inside an edge network. The results indicate that these methods can be effective
against current attacks and suggest directions for improving detection of more stealthy
attacks.
Jin et al. (2004) proposed a covariance analysis model for detecting SYN flooding
attacks. the correlations among the features may provide additional essential information.
In terms of correlation, the normal patterns will be different from the abnormal patterns.
In this sense detecting the correlation changes among different features could determine
the occurrence of the anomalies. A two variables covariance model is presented in this
paper as a possible approach to detecting the DDoS attacks.
Gavrili, et al. (2005) proposed Radial-Basis-Function neural network (RBF-NN) to
recognize DDoS attacks from the normal traffic . RBF-NN detector is a two layer neural
network. It uses nine packet parameters, and the frequencies of these parameters are
estimated. Based on the frequencies, RBF-NN classifies traffic into attack or normal
class.
Lee et al. (2008) proposed a method for proactive detection of DDoS attack by
exploiting its architecture which consists of the selection of handlers and agents, the
communication and compromise, and attack. The features are selected based on the
procedures of DDoS attack. After that, cluster analysis performed for proactive detection
of the attack. The experiment is performed with 2000 DARPA Intrusion Detection
Scenario Specific Data Set in order to evaluate our method. The results show that each
phase of the attack scenario is partitioned well and this method can detect precursors of
DDoS attack as well as the attack itself.
Rahmani et al. (2009) presented entropy-based anomaly detection, using joint
entropy analysis of multiple traffic distributions. That observed the time series of IP-flow
number and aggregate traffic size are strongly statistically dependent. The occurrence of
attack affects this dependence and causes a rupture in time series of joint entropy values.
Xia et al. (2010) presented a method that can identify the occurrence of the DDoS
flood attack and determine its intensity using the fuzzy logic. This process consists of two
stages: (i) statistical analysis of the network traffic time series using discrete wavelet
transform (DWT) and Schwarz information criterion (SIC) to find out the change point of
Hurst parameter resulting from DDoS flood attack, and then (ii) adaptively decide the
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4141
intensity of the DDoS flood attack by using the intelligent fuzzy logic technology to
analyze the Hurst parameter and its changing rate.
Zhong et al. (2010) presented a DDoS attack detection model based on data mining
algorithm. FCM cluster algorithm and Apriori association algorithm used to extracts
network traffic model and network packet protocol status model. Apriori association
algorithm is used in mining of packet protocol status. The packet protocol status
appearing frequently in the network could be combined into one association record. The
data collected continuously for a period is used to calculate the packet protocol status
threshold through the FCM cluster algorithm.
Liu et al. (2013) proposed an anomaly detection method for DDoS at-tacks based
on Gini coefficient. First, Gini coefficient is introduced to measure the inequalities of
packet attribution (IP addresses and ports) distributions during attacks. Then, an
improved (Transductive Confidence Machines for K-Nearest Neighbors) TCM-KNN
algorithm is applied to identify attacks by classifying the Gini coefficient samples
extracted from real-time network traffic. Experiment was made on the DDoS attacks
dataset (LLDoS 2.0.2) from MIT Lincoln Laboratory.
Chen et al. (2013) proposed a detection model based on conditional random fields
(CRF). The CRF based model incorporates the signature based and anomaly-based
detection methods to a hybrid system. The selected features include source IP entropy,
destination IP entropy, source port entropy, destination port entropy, protocol number and
etc. The CRF based model combines these IP flow entropies and other fingerprints into a
normalize entropy as the feature vectors to depict the states of the monitoring traffic. The
training method of the detection model uses the L-BFGS algorithm
3.Problem Formulation and Methodology
In order to satisfy early detection of DDoS attack , we employ entropy concept and
cluster analysis. the idea of this research is separate each phase of DDoS attack,
DBSCAN clustering algorithm used in training phase and then the corresponding cluster
centroids (average of each cluster) are used as patterns for efficient distance-based
detection in testing phase . Figure 1 illustrates the proposed system flow chart .
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4141
Figure (1) : The general architecture of proposed system.
3.1 Extraction of the detection features
According to DDoS architecture , the DDoS attack is performed by following steps :
(Douligeris C. et al., 2004)
Selection of handlers and agents
Compromise
Communication
Attack From this procedure of a DDoS we can find out traffic parameters which change abnormally in
each step. Lee et al. (Lee et al.,2008) presented nine features based on the analysis of DDoS
attack's characteristics . we will use these features in our method . "In the first step, real attacker sends ICMP Echo Request packets to find handlers and
agents that help attack, which is called IPsweep"(Lee et al.,2008). A lot of ICMP traffics
are generated , therefore the occurrence rate of ICMP packets may be abnormally high
compared to normal traffics. Also In this period, destination IP address in network flow
would be distributed randomly.
Start
data set
Features Extraction for each sample of
consecutive packets
Clustering by DBSCAN
Extract set of centroid
points (mean of each
cluster)
as pattern
distance-based
classification
end
System validation
Testing
Training
Generate a data base from the extracted
features
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4141
In second and third steps , a specific traffic type such as ICMP,UDP and TCP SYN
packets can be used for message exchange. Hence ,the occurrence rates of these types of
packets can indicate the preparation for launching a DDoS attack (Zi et al.,2010).
Under DDoS attack , the agents randomly generate the source IP addresses of attack
packets to hide their real addresses. They also randomize the destination and source port
numbers depending on the attack type, therefore this randomize can provide useful
information to detection DDoS attack . In order to measure the degree of divergence , Lee
et al. (Lee et al.,2008) suggest to use the concept of entropy .
Let an information source has n independent symbols each with probability of choice Pi.
Then, the entropy H is defined as follows (Shannon, 1948):
…(1)
Entropy would compute on a sample of consecutive packets. Comparing the value for
entropy of sample with other provides a mechanism for detecting changes in the
randomness (Lee et al.,2008).
In the IPsweep phase, the entropy value of source IP address becomes small and
that of destination IP address increases. In the attack phase, attack packets have diverse
source IP addresses and a target destination IP address. The entropy value of source IP
address increases and that of destination IP address converges to a very small value.
Similarly, the entropy values of source and destination port numbers can be useable for
DDoS detection since some types of DDoS attacks use random port numbers in the
attack. In addition, one DDoS attack may use a specific type of packets, the entropy value
of packet type may be useful. If the entropy value of packet type is very small, it is
possible that some kind of DDoS attack is being launched.
In our experiments, we use the same nine features which were presented in (Lee et
al., 2008). The features are :
Entropy of source IP address and port number.
Entropy of destination IP address and port number.
Entropy of packet type.
Number of packets.
Occurrence rate of packet type (ICMP, UDP, TCP SYN).
3.2 Clustering analysis by DBSCAN (Training phase)
Clustering is method by which the large sets of data are grouped into clusters of
similar data . By using cluster analysis, we can separate normal traffic and each phase of
the DDoS attack into partitioned groups if variables involved to form cluster have
dissimilarities among them. Hence, in this paper, we apply cluster analysis to separate
each phase of the DDoS attack. We first employ a clustering algorithm to partition a
training data set to clusters that represented normal and each phase of DDoS attack then
extracted pattern from these clusters to use it in online detection.
We adopt DBSCAN algorithm for clustering purpose. Density-Based Spatial
Clustering and Application with Noise (DBSCAN) was a clustering algorithm based on
density. It did clustering through growing high density area, and it can find any shape of
clustering. The basic idea of using DBSCAN in DDoS attack detection is that most of the
data is normal traffic while the attack data is very few , and different with normal data .
In training mode , we need to modify DBSCAN algorithm by adding new step that
compute the centroid µ of each cluster as following :
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4141
This centroids representing a pattern to detect the DDoS attack phases in online mode .
The modified DBSCAN algorithm steps is shown below :
Algorithm 1 DBSCAN ( D, ε , MinPts)
Input : training data set D , neighbourhood radius ε , density threshold MinPts
Output : labels the data with cluster id (or NOISE) , centriod points set µk
1 Begin
2 label all data x ∈ D as UNCLASSIFIED
3 initialize cluster counter cid =0
4 foreach x ∈ D
5 if x is labelled as UNCLASSIFIED
6 if expand (D , x , cid , ε , MinPts )
7 increment cluster counter cid = cid +1
8 end end
9 end
10 foreach cluster k
11 µk=averge of points assigned to cluster k
12 end
13 return set of µ
14 End
Algorithem expand (D , x , cid , ε , MinPts ) : bool
Input : data set D , x∈ D , currently unused cluster – id cid , neighbourhood radius ε ,
density threshold MinPts
Output : returens true iff a new cluster has been found
1 Begin
2 let S = { y ∈ D | || x - y || ≤ ε} (range query)
3 if not enough data in neighborhood of x (| S | < MinPts )
4 re-lable x as NOISE and return false
5 end
6 foreach x' ∈ S
7 re-lable x' with current cluster-ID cid
8 remove x from S
9 end
10 foreach x' ∈ S
11 T = { y ∈ D | ||x' – y|| ≤ ε } ( range query )
12 if not enough data in neighborhood of x' (| T | ≥ MinPts )
13 foreach y ∈ T
14 if y dose not belong to a cluster(labeled as NOISE or UNCLASSFED)
15 if y is labelled UNCLASSFED : insert y into S
16 re-lable y with cluster- countr cid
17 end
18 end
19 remove x' from S
20 end
21 return true
22 End
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4141
3.3 Distances-based detection (Classification phase)
By clustering , we got the centroid points set µk , and from the descriptions of training
data set , we can examine which cluster corresponds to specific phase of DDoS attack .
The distances to the cluster centroids of the corresponding traffic class are calculated
using the Euclidean distance function. An object is classified as normal if it is closer to
the normal cluster centroid or to the anomalous specific phase of DDoS attack. This is
illustrated in Figure 2 with a two-dimensional feature space: Object P is closer to the
normal cluster, therefore P is normal. This distance-based classification allows detecting
known kinds of anomalies, i.e. anomalous traffic with similar characteristics as in the
training datasets.
Figure (2): Distances-based classification for tow centroids .
The distances-based classification algorithm is shown below:
Algorithm 2 distances-based classification ( D, µk)
Input : testing data set D , centriod points set µk with ther class
Output : labels the data with class
1 Begin
2 foreach x ∈ D
3 foreach c ∈ µk
4 compute distances
5 end
6 labels x with same class to center have min distances with x
7 end
8 End
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4141
3.4. Performance Metrics
"The performance measures for intrusion detection can be calculated by a confusion
matrix " (Tsai C. F. et al.,2010). . Confusion Matrix: This may be used to summarize the
predictive performance of a classifier on test data. It is commonly in a two-class format,
but can be generated for any number of classes(Mukherjee S. et al.,2012).. A confusion
matrix for two classes is shown in following table :
Table 1 confusion matrix (Tsai C. F. et al.,2010)
Actual Predicted
Normal Attack
Normal TN FP
Attack FN TP
To evaluate the results, we have used standard metrics such detection rate DR , false
alarm FA and accuracy.
A successful anomaly detection algorithm should achieve high DR, high accuracy and
low FP.
4.Experimental Results This research work has implemented in C# language and executed in the processor
Intel(R) Celeron(R) CPU 2.00 GHZ processor and 2.GB main memory under the
Windows 7 Ultimate operating system.
4.2 DARPA 2000 Datasets Description DARPA (Defense Advanced Research Project Agency) 2000 is intrusion detection
evaluation data set . This data set is mainly designed to evaluate the detection probability
and false detection probability for every network security system under test especially in
intrusion detection research field (MITLLab, 2000).
The DARPA 2000 datasets contains multiple specific network attacking scenarios. In this
research, LLDDoS 1.0 scenario will be used to evaluate the proposed model. Figure 4.1
shows the network structure of this data set .
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4111
Figure (3) Network structure used in DARPA 2000 dataset
LLDDoS 1.0 scenario from 2000 DARPA data sets includes a DDoS attack run
by a novice attacker . This attack scenario is carried out over multiple network and audit
sessions. These sessions have been grouped into five attack phases. The five phases are as
follows:
1. IPsweep to the DMZ hosts from a remote site.
2. Probe of live IP’s to look for the sadmind daemon running on Solaris hosts.
3. Breaks-in via the sadmind vulnerability, both successful and unsuccessful on
those hosts.
4. Installation of the Trojan mstream DDoS software on three hosts in the DMZ.
5. Launching the DDoS.
This Data Set has two types of Tcpdump file. One is DMZ Tcpdump which is collected at
the sniffer on the DMZ network, the other is inside Tcpdump which is collected at the
sniffer on the inside network. In this attack scenario, the attacker only communicates
with agent hosts in the DMZ network and cannot communicate with the victim host in the
inside network. For this reason, we use the DMZ Tcpdump file to detect the DDoS attack
in early phases. "In phase 5 of the attack, packets collected to DMZ Tcpdump are not the
attack packet but the response packets to the spoofed IP of the attack packets"(Lee K. et
al.,2008).
4.3 Features Extraction
In proposed DDoS detection method , the features are extracted based on analyzing
DDoS procedure . Under this attack the source IP address will be generated randomly ,
they also randomize the destination and source port number depending on the attack type
. In order to measure this randomness entropy would compute on a sample of
consecutive packets. In our experiment , each input variable is calculated in certain time interval which is 1 second .
Firewall
DMZ
network
Firewall Attacker
Inside
network
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4114
Figure (4) : Entropy values of a) destination IP b) source IP c) destination port d) source
port e)number of packets f) packet type g)occurrence of TCP SYN h) ICMP i)UDP
ICMP Flooding DDoS Flooding
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4111
4.4 Clustering by DBSCAN
The basic idea of DDos detection based on DBSCAN is that most of the data is
normal, and normal data will be gathered together into a high-density cluster, while the
invasion data is few, and different with normal data. The DBSCAN algorithm is based on
the concepts of density reach ability and density-connectivity. These concepts depend on
two input parameters: epsilon (Eps) and minimum number of points (MinPts). Epsilion is
the distance around an object that defines its eps- neighborhood.
It is difficult to determine the accurate values of Eps and MinPts. However, we can
use a probable range for their values by means of experience. We have tried 40 values of
Eps between (0.1 - 0.5), the step is 0.01 . The values of MinPts are 2,3,4,5,6,7,8 and 9.
Through the implementation of the algorithm 320 times with different Eps and
MinPts ,we found that the best accuracy and maximum number of DDoS phase get it
when the Eps = 0.12 , and MinPts = 3 , The results of the experiment are graphically
described in Figure 5.
Figure (5) : : Experiments to determining values of Eps and MinPts.
After determine the best value of Eps and MinPts , We found 10 sets and by
information provided by DARPA 200 dataset about each phase in DDoS attack has
been adopted to evaluate the result of algorithm . Table 2 and 3 show the confusion
matrix for each six classes and binary classes ( normal and attack ), respectively . the
accuracy of training phase is 99.1611% ,detection rate and false alarm are 51.16279%
and 0.362632% , respectively.
Table 2 : confusion matrix for DBSCAN algorithm result for the six classes
Actual Predicted
Normal Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Total
Normal 5770 0 19 2 0 0 5791
Phase 1 0 4 0 0 0 0 4
Phase 2 0 0 13 0 0 0 13
Phase 3 8 0 0 2 0 0 10
Phase 4 8 0 0 0 0 0 8
Phase 5 12 0 0 0 0 3 15
Total 5798 4 32 4 0 3 5841
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4111
Table 3 : confusion matrix for DBSCAN algorithm result for normal and attack
classes
Actual Predicted
Normal attack
Normal 5770 21
Attack 28 22
4.5 Classification
By clustering , we got the centroid points set as pattern to detect the attack phases in
DDoS attack and normal . To evaluate the proposed classification method we use 1/3 part
from DARPA 2000 data set .
Table 4 and 5 show the confusion matrix for each six classes and binary classes
(normal and attack), respectively . the accuracy of training phase is 99.24633% ,detection
rate and false alarm are 61.11111% and 0.280224%, respectively.
Table 4 : confusion matrix for classification algorithm result for the six classes
Actual Predicted
Normal Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Total
Normal 2491 0 7 0 0 0 2498
Phase 1 0 7 0 0 0 0 7
Phase 2 0 0 2 0 0 0 2
Phase 3 4 0 0 0 0 0 4
Phase 4 0 0 0 0 0 0 0
Phase 5 8 0 0 0 0 2 10
Total 2503 7 9 0 0 2 2521
Table 5 : confusion matrix for classification algorithm result for normal and attack
classes
Act
ual
Predicted
Normal Attack
Normal 2491 7
Attack 12 11
5.Conclusions
In this paper we present a suitable method for the early detection of DDoS attacks
using the cluster analysis. Many studies on DDoS attack detection have been carried out;
however, they focus only on the change in network traffic. The methods based data
mining are suitable for the detection. Our method first selects nine features of
packet/traffic that are widely found in various phases of the attack. Then, the current
network status is classified to determine the class to which it belongs to. Hence, our
method can classify the network status well to detect DDoS attacks early.
Journal of Babylon University/Pure and Applied Sciences/ No.(4)/ Vol.(23): 2015
4111
6.References
CERT. (1998, January 5). CERT Advisory CA-1998-01 Smurf IP Denial-of-Service
Attacks. Retrieved January 5, 2014, from http://www.cert.org/advisories/CA-
1998-01.html.
Chen, S. W., Wu, J. X., Ye, X. L., & Tong, G. U. O. (2013). Distributed Denial of
Service Attacks Detection Method Based on Conditional Random Fields. Journal of
Networks, 8(4).
Douligeris, C., & Mitrokotsa, A. (2004). DDoS attacks and defense mechanisms:
classification and state-of-the-art. Computer Networks, 44(5), 643-666.
Feinstein, L., Schnackenberg, D., Balupari, R., & Kindred, D. (2003, April). Statistical
approaches to DDoS attack detection and response. In DARPA Information
Survivability Conference and Exposition, 2003. Proceedings (Vol. 1, pp. 303-314).
Gavrilis, D., & Dermatas, E. (2005). Real-time detection of distributed denial-of-service
attacks using RBF networks and statistical features. Computer Networks, 48(2),
235-245.
Jin, S., & Yeung, D. S. (2004, June). A covariance analysis model for DDoS attack
detection. In Communications, 2004 IEEE International Conference on (Vol. 4, pp.
1882-1886). IEEE.
Lee, K., Kim, J., Kwon, K. H., Han, Y., & Kim, S. (2008). DDoS attack detection method
using cluster analysis. Expert Systems with Applications, 34(3), 1659-1665.
Liu, Y., Jiang, S., & Huang, J. (2013, August). Anomaly Detection for DDoS Attacks
Based on Gini Coefficient. In 2013 International Conference on Advanced ICT and
Education (ICAICTE-13). Atlantis Press.
MITLLab (2000). 2000 darpa intrusion detection scenario specific datasets, Retrieved
July 30, 2012,
http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/2000data.ht
ml.
Mukherjee, S., & Sharma, N. (2012). Intrusion detection using naive Bayes classifier
with feature reduction. Procedia Technology, 4, 119-128.
Rahmani, H., Sahli, N., & Kammoun, F. (2009, August). Joint entropy analysis model for
DDoS attack detection. In Information Assurance and Security, 2009. IAS'09. Fifth
International Conference on (Vol. 2, pp. 267-271). IEEE.
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical
Journal, vol. 27, pp. 379-423 & 623-656.
Tsai, C. F., & Lin, C. Y. (2010). A triangle area based nearest neighbors approach to
intrusion detection. Pattern Recognition, 43(1), 222-229.
Wan, K. (2001). An infrastructure to defend against distributed denial-of-service attack
(M. Sc. Thesis, The Hong Kong Polytechnic University).
Xia, Z., Lu, S., Li, J., & Tang, J. (2010). Enhancing DDoS flood attack detection via
intelligent fuzzy logic. Informatica (Slovenia), 34(4), 497-507.
Zhong, R., & Yue, G. (2010, April). DDoS detection system based on data mining. In
Proceedings of the Second International Symposium on Networking and Network
Security, Jinggangshan, China (pp. 62-65).
Zi, L., Yearwood, J., & Wu, X. W. (2010, September). Adaptive clustering with feature
ranking for ddos attacks detection. In Network and System Security (NSS), 2010 4th
International Conference on (pp. 281-286). IEEE.