data mining for network intrusion detection
DESCRIPTION
Paul Dokas, Levent Ertoz, Vipin Kumar, Aleksandar Lazarevic, Jaideep ZSrivastava, Pang-Ning Tan Computer Science Department University of Minnesota. CS685 Presentation. Data Mining for Network Intrusion Detection. Presented By: [email protected]. CS685 Presentation. Outlines Motivation - PowerPoint PPT PresentationTRANSCRIPT
Data Mining for Data Mining for Network Intrusion DetectionNetwork Intrusion Detection
Paul Dokas, Levent Ertoz, Vipin Kumar, Aleksandar Lazarevic, Jaideep Paul Dokas, Levent Ertoz, Vipin Kumar, Aleksandar Lazarevic, Jaideep ZSrivastava, Pang-Ning TanZSrivastava, Pang-Ning Tan
Computer Science DepartmentComputer Science DepartmentUniversity of MinnesotaUniversity of Minnesota
Presented By: [email protected]
CS685 Presentation
CS685 Presentation
OutlinesOutlines • Motivation
• Related Work
• Detection Models and Approaches
• Experimental Evaluation
• Conclusion
CS685 Presentation
MotivationMotivation • Organizations are becoming increasingly vulnerable to potential cyber threats, e.g., network intrusions.
cyber incidents reported to CERT/CC
Incidents Reported to Computer Emergency Response Team/Coordination Center (CERT/CC)
0
10000
20000
30000
40000
50000
60000
90 91 92 93 94 95 96 97 98 99 00 01
CS685 Presentation
Motivation (cont.)Motivation (cont.) •Intrusion Detection System (IDS)
• collect signatures of known attacks • input attack signatures into IDS signature databases• extract features from various audit streams • compare these features with attacks signatures• raise the alarm when possible intrusion happens
•LimitationsLimitations of traditional signature-based methods
• manual update of signature database • inability to detect emerging cyber threats
CS685 Presentation
Motivation (cont.)Motivation (cont.)
Why data mining?
• large volumes of network data
• different data mining techniquesclustering, classification
CS685 Presentation
Related WorkRelated Work Data mining based intrusion detection techniques
• anomaly detection• Build models of normal data• Detect any deviation from normal data• Flag deviation as suspect• Identify new types of intrusions as deviation from normal behavior
• misuse detection• Label all instances in the data set (“normal” or “intrusion” ) • Run learning algorithms over the labeled data to generate
classification rules• Automatically retrain intrusion detection models on different input
data
CS685 Presentation
Related WorkRelated Work --- misuse detection
•Classification Model
Bayesian classifier
Decision tree
Association rule
Support vector machine
Learning from rare class
CS685 Presentation
Related WorkRelated Work --- anomaly detection
•Anomaly Detection Model
Association rule
Neural network
Unsupervised SVM
Outlier detection
CS685 Presentation
Detection ModelsDetection Models
• misuse detection rare class prediction model
known intrusions and their variations
• anomaly detectionoutlier detection model
novel attacks whose nature is unknown
CS685 Presentation
Learning from Rare ClassLearning from Rare Class
• Problem: classification model for dataset with skewed class distribution ?
intrusion class << normal class Mining needle in a haystack
CS685 Presentation
Learning from Rare Class (cont.)Learning from Rare Class (cont.)
• Novel classification algorithms
•PN-rule• P-rule most of intrusive examples• N-rule eliminating false alarms
•SMOTEBoost•SMOTE (Synthetic Minority Over-sampling TEchnique)•Boosting
CS685 Presentation
Anomaly DetectionAnomaly Detection
•Novel attacks/intrusions deviation from normal behavior
•Outlier detection algorithm
Nearest neighbor approachDistance based approachDensity based approach
Unsupervised support vector machines
CS685 Presentation
Anomaly DetectionAnomaly Detection
• Density based approach (LOF)
CS685 Presentation
Anomaly DetectionAnomaly Detection
•Identify normal behavior
•Construct useful set of feature
•Define similarity function
•Flag deviation as suspect
CS685 Presentation
Experimental EvaluationExperimental Evaluation
•Public data setDARPA 1998 Intrusion Detection Evaluation Data Set
prepared and managed by MIT Lincoln Labtraining data and test data
KDD Cup 1999 Data the extension of DARPA’98
training data and test data
•Real network dataNetwork data from University of Minnesota
CS685 Presentation
Experimental EvaluationExperimental Evaluation --- feature construction
Purpose: more informative data set from public data set
Method:
• connection records• label connection records ‘normal‘ or ‘intrusion‘• features for each connection record # of {packets, bytes}, {ACK, Re-Tx} packets, SYN/FIN, … time-based features ( DoS attacks )
connection-based features ( PROBING attacks )
CS685 Presentation
ExperimentalExperimental EvaluationEvaluation --- single connection attacks
0 0.02 0.04 0.06 0.08 0.10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
LOF approachNN approachMahalanobis approachUnsupervised SVM
ROC Curves for different outlier detection techniques
De
tect
ion
Ra
te
False Alarm Rate
ROC curves for single connection attacks
CS685 Presentation
Experimental EvaluationExperimental Evaluation --- bursty attacks --- bursty attacks
0 0.02 0.04 0.06 0.08 0.1 0.120.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
ROC Curves for different outlier detection techniques
False Alarm Rate
De
tect
ion
Ra
te
Unsupervised SVMLOF approachMahalanobis approachNN approach
ROC curves for bursty attacks
CS685 Presentation
Experimental EvaluationExperimental Evaluation --- --- real network datareal network data
•Why? Limitations of DARPA’98 data set
•How? Detect network intrusion in the live network
traffic
•Result? •Successfully identify some novel intrusions (top ranked outliers)
CS685 Presentation
ConclusionConclusion
• promising intrusion detection models
• performance of algorithm (on-line detection)
• new classification and anomaly detection algorithms
CS685 Presentation
Thanks!
Questions?