adaptive hybrid model for network intrusion detection and...
TRANSCRIPT
Adaptive Hybrid Model for Network Intrusion Detection and ComparisonAmong Machine Learning Algorithms
Md. Enamul Haque
Department of Computer EngineeringKing Fahd University of Petroleum and Minerals
Saudi Arabia
[email protected] by Dr. Talal Alkharobi
May 21, 2014
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 1 / 28
Coming UpToday’s agenda.
Network Intrusion Detection.
Objective.
Proposed Model.
Algorithms Used.
Classifier Overview.
Dataset Description.
Results.
Conclusion
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 2 / 28
Network Intrusion DetectionLet’s review
Host based intrusion detection: monitors andanalyzes the internal interfaces
Network based intrusion detection.
Misuse Based: searches for known intrusivepatterns.Anomaly Based: Supervised, Unsupervised,and Hybrid Anomaly Detection..
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 3 / 28
Attack TypesBroad category
DOS: Denial of service.
R2L: Unauthorized access to the local system from a remote host.
U2R: Unauthorized access to the root of a local system.
Probe: Sensing network from outside to detect vulnerabilities.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 4 / 28
Anomaly TypesBroad classification
Table : Anomaly Types
Attack Type Exploits
DOS back, land, neptune, pod, smurf, teardrop
U2R buffer overflow, load module, perl, rootkit
R2I ftp write, guess pass, imap, multi hop
phi, spy, warezclient, warezmaster
Probe ip sweep, saint, satan, Nmap
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 5 / 28
Exploits CategorySample Information
Feature name Description Type
duration length (number of seconds) Continuous
of the connection
protocol type type of the protocol Discrete
e.g. tcp, udp, icmp etc.
land 1 if connection is from/to the Discrete
samehost/port; 0 otherwise.
urgent number of urgent packets Continuous
hot number of ”hot” indicators Continuous
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 6 / 28
ObjectiveLet’s be clear what we wanted to do.
We have intrusion classified data and incoming traffic
Classify the incoming traffic if there is any abnormality.
If abnormality present, classify into specific category.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 7 / 28
MotivationReinventing the wheel or what?
Building more accurate prediction model.
Adaptive learning for the model.
Detect novel intrusions.
Performance Comparison among existing learning models.
Artificial Neural Network and Support Vector Machines are already used.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 8 / 28
Proposed ModelOverview
Figure : Network Intrusion Detection Model
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 9 / 28
AlgorithmIn brief
Figure : Network Intrusion Detection ModelMd. Enamul Haque (KFUPM) COE 551 May 21, 2014 10 / 28
Classifiers UsedThree major classifiers used
Classifiers
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 11 / 28
Naive Bayes ClassifierHow it works in simple terms?
Value of a particular feature is unrelated to the presence or absence of any other feature,given the class variable.
Example: A fruit may be considered to be an apple if it is red, round, and about 3 inch indiameter.
Each of these features are considered to contribute independently to the probability thatthis fruit is an apple.
Regardless of the presence or absence of the other features.
It can be trained very efficiently in a supervised learning setting.
Requires a small amount of training data to estimate the parameters (means andvariances of the variables) necessary for classification.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 12 / 28
Naive Bayes ClassifierHow it works in mathematical terms?
Bayes theorem:p(C |F1, . . . ,Fn) = p(C)p(F1,...,Fn|C)
p(F1,...,Fn)
posterior = prior×likelihoodevidence
In our problem,C = Anomaly / NormalF1, . . . ,Fn = The featuresn=No. of features.
Figure : Prediction based on recent events
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 13 / 28
Random ForestsHow it works?
Training set X = x1, . . . xn with class label/ responses Y = y1 . . . yn
Sample from n training examples X,Y; callthese Xb,Yb.
Train a decision or regression tree fb onXb,Yb.
Predictions for unseen samples x ′ can bemade by averaging the predictions from allthe individual regression trees on
x ′: f̂ = 1B
B∑b=1
f̂b(x ′)Figure : Random forests
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 14 / 28
k-Nearest NeighborInstance based KNN (IBK)
Classify an unknown example with themost common class among k closestexamples.
Tell me who your neighbors are, and I willtell you who you are!
Example: k=3, 2 sea bass, 1 salmon.
Classified as sea bass.
Figure : Simple example for the idea.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 15 / 28
k-NN Distance SelectionWorst case scenario
Feature 1 gives the correct class: 1 or 2.
Feature 2 gives irrelevant number from 100 to 200.
Training dataset: [1 150] [2 110]
Classify [1 100]
D([1 100], [1 150]) =√
(1− 1)2 + (100− 150)2 = 50 (1)
D([1 100], [2 110]) =√
(1− 2)2 + (100− 110)2 = 10.5 (2)
[1 100] is misclassified!
The denser the samples , the less of this problem.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 16 / 28
k-NN:Feature NormalizationEqualizing the scale of the features.
Notice that 2 features are on different scales:
First feature takes values between 1 or 2.
Second feature takes values between 100 to 200.
Idea: normalize features to be on the same scale.
Different normalization approaches.
Linearly scale the range of each feature to be, say, in range [0,1].
fnew =fold − f min
old
f maxold − f min
old
(3)
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 17 / 28
k-NN: How to Choose k?Is there any standard?
Figure : Sometimes due to noise1-NN provides erroneousoutcome.
Figure : 3-NN provides betterclassification accuracy than1-NN in this case.
Rule of thumb isk <√n, n is number of
examples.
In practice, k = 1 is oftenused for efficiency, butcan be sensitive to noise.
Larger k may improveperformance.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 18 / 28
Dataset DistributionsAnomaly and normal quantity
Category No. of Instances
Normal 67343
Anomaly 58630
Total 125973
Table : Dataset Used in the Experiment
Category No. of Instances Contribution
DOS 9234 Continuous
U2R 11 Continuous
R2L 209 Continuous
Probe 2289 Continuous
Table : Distribution of Reduced Dataset for Anomaly Class
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 19 / 28
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 20 / 28
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 21 / 28
Feature ReductionToo much for computation.
Attribute Evaluator Search Method No. of Selected Attribute Selected Attributes
CFS Genetic Search 15 4,5,6,8,10,12,17,23,26,29,30,32,37,38,39CFS PSO Search 9 4,5,6,12,26,29,30,37,39CFS Best First 6 4,5,6,12,26,30CFS Evolutionary Search 18 3,4,5,6,8,17,19,23,25,26,29,30,33,34,37,38,39,41
Consistency Subset Greedy Stepwise 10 1,3,4,5,14,23,32,34,35,37
Table : Features Reduction
Reduce the features without affecting the accuracy to gain less computation.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 22 / 28
Detailed Accuracy By Class10-fold Cross Validation for Random Forest
Table : Detailed Accuracy By Class : 10-fold Cross Validation for Random Forest
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Type0.999 0.002 0.998 0.999 0.999 0.998 1.000 1.000 normal0.998 0.001 0.999 0.998 0.999 0.998 1.000 1.000 anomaly
Table : Confusion Matrix for Random Forest
a b Classified As
67308 35 a = normal
117 58513 b = anomaly
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 23 / 28
Classification AccuracyBased on confusion matrix
NaiveBayes PART RandomForest Grading Adaboost IBK0
10
20
30
40
50
60
70
80
90
100
Machine Learning Classifier
Accura
cy(%
)
Figure : Classification accuracy for different learning/classification algorithms. The major parameterswere tuned for each of the execution.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 24 / 28
Tools and EquipmentsThose came handy
KDD Cup 1999
MySQL: Data preprocessing.
MATLAB: Algorithm testing and graph generation.
WEKA 3.7.9: Actual classification performed.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 25 / 28
References
Herrero, lvaro, et al. RT-MOVICAB-IDS: Addressing real-time intrusion detection. Future GenerationComputer Systems 29.1 (2013): 250-261.
McHugh, John. Testing intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusiondetection system evaluations as performed by Lincoln Laboratory. ACM transactions on Information andsystem Security 3.4 (2000): 262-294.
Tavallaee, Mahbod, et al. A detailed analysis of the KDD CUP 99 data set. Proceedings of the SecondIEEE Symposium on Computational Intelligence for Security and Defence Applications 2009. 2009.
Kim, Gisung, Seungmin Lee, and Sehun Kim. A novel hybrid intrusion detection method integratinganomaly detection with misuse detection. Expert Systems with Applications 41.4 (2014): 1690-1700.
Luo, Bin, and Jingbo Xia. A novel intrusion detection system based on feature generation withvisualization strategy. Expert Systems with Applications (2014).
Fung, Carol J., and Raouf Boutaba. Design and management of collaborative intrusion detection networks.Integrated Network Management (IM 2013), 2013 IFIP/IEEE International Symposium on. IEEE, 2013.
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 26 / 28
Future DirectionsLets think about next level!!
Classify the anomaly class into further specific divisions
Usage of unsupervised learning methods.
Knowledge base development
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 27 / 28
Questions?Suggestions?
Md. Enamul Haque (KFUPM) COE 551 May 21, 2014 28 / 28