Download - slides
![Page 1: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/1.jpg)
Data Mining Based Intrusion Detection System
Krishna C Surendra Babu
![Page 2: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/2.jpg)
Papers: A Data Mining Framework for Building
Intrusion Detection Models(Wenke Lee, Salvotore J. Stolfo)- Research supported in parts by grants from
DARPA
Creation and Deployment of Data Mining-Based Intrusion Detection Systems in Oracle Database 10g
![Page 3: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/3.jpg)
Intrusion Detection System:
Intrusion Detection Techniques: Anomaly Detection
Misuse Detection DOS Probing Unauthorized access to local super user
(U2R) Unauthorized access from a remote
machine (R2L)
![Page 4: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/4.jpg)
Requirements: Reliable Extensible Easy to manage Low maintenance cost
![Page 5: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/5.jpg)
Data MiningData mining refers to extracting or mining knowledge from large amounts of data.
Data Warehouse A data warehouse is a repository
of information collected from multiple sources
A Data Mining Framework for Building Intrusion Detection Models
![Page 6: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/6.jpg)
![Page 7: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/7.jpg)
Why Data Mining? The dataset is large. Constructing IDS manually is
expensive and slow. Update is frequent since new
intrusionoccurs frequently.
A Data Mining Framework for Building Intrusion Detection Models
![Page 8: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/8.jpg)
Challenges for Data Mining in building IDS
Develop techniques to automate theprocessing of knowledge-intensive feature
selection. Customize the general algorithm to incorporate
domain knowledge so only relevant patterns are reported
Compute detection models that are accurate and efficient in run-time
![Page 9: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/9.jpg)
Mining the data
Dataset Types: Network based dataset Host based dataset
Build IDS by mining in the records. When an attack is detected, give alarms to
the administration system.
![Page 10: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/10.jpg)
Framework of Building IDS
Preprocessing. Summarize the raw data. Association Rule Mining. Find sequence patterns (Frequent
Episodes) based on the association rules. Construct new features based on the sequence patterns. Construct Classifiers on different set of features
![Page 11: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/11.jpg)
Preprocessing To summarize raw data to high level
event, e.g network connection, time, duration,
service, host, destination
Bro and NFR Packet filtering Techniques can be used.
![Page 12: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/12.jpg)
![Page 13: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/13.jpg)
Classification Classify each audit record into one of
a discrete set of possible categories, normal or a particular kind of intrusion.
![Page 14: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/14.jpg)
Association rule mining
Searches for interesting relationships among attributes in a given data set i.e. to derieve multi feature(attribute) correlations from a database table.
![Page 15: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/15.jpg)
Sequence Pattern Mining
Frequent Episodes. X,Y->Z, [c,s,w] With the existence of itemset X and Y, Z
will occur in time w.
![Page 16: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/16.jpg)
Feature Construction
Feature extraction is the processes of determining what evidence that can be taken from raw audit data is most useful for analysis.
Construct new feature according to the frequent episode.
Some features will show close relationship to
each other. Then combine the features. Some frequent episode may indicate
interesting new features.
![Page 17: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/17.jpg)
Build Model (classifier) Build different classifiers for differentattacks.
![Page 18: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/18.jpg)
Experiments
The DARPA data 4G compressed tcpdump data of 7 weeks of network
traffics. Contains 4 main categories of attacks
DOS: denial of service, e.g., ping-of-death, syn flood
R2L: unauthorized access from a remote machine, e.g., guessing password
U2R: unauthorized access to local super user privileges by a local unprivileged user, e.g., buffer overflow
PROBING: e.g., port-scan, ping-sweep
![Page 19: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/19.jpg)
Results
Training on the 7 weeks of labeled data, and testing on
the 2 weeks unlabeled data. The test data contains 14 attack types which do
not exist in training data. Comparing 4 methods:
Columbia: the IDS developed according to the framework
introduced above Group 1-3: three systems developed by knowledge
engineering approaches.
![Page 20: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/20.jpg)
![Page 21: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/21.jpg)
Results
Detection rate on New and Old attacks. Old attacks: type of attacks occur in both
training and testing data. New attacks: type of attacks occur in testing
data only.
![Page 22: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/22.jpg)
Creation and Deployment of Data Mining Based Intrusion Detection Systems in Oracle Database 10G
DAID A database centric architecture that leverages data mining with in the Oracle RDBMS to address the challenges.
Scheduling capabilities Alert infrastructure Data analysis tools Security Scalability reliability
![Page 23: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/23.jpg)
Requirements for a production quality IDS
Centralized view of the data Data transformation capabilities Analytic and data mining methods Flexible detector deployment, including
scheduling that enables periodic model creation and distribution
Real-time detection and alert infrastructure Reporting capabilities Distributed processing High system availability Scalability with system load
![Page 24: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/24.jpg)
• Sensors • Extraction, transformation
and load (ETL) • Centralized data
warehousing • Automated model
generation • Automated model
distribution • Real-time and offline
detection • Report and analysis • Automated alerts
![Page 25: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/25.jpg)
Sensors Collects audit information
Network traffic data System logs on individual hosts System calls made by processes
![Page 26: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/26.jpg)
ETL
Used for pre processing audit streams and feature extraction
Use SQL and user defined functions to extract key pieces of information.Ex: computes windowing analytic function to
compute the number of http connections to a given host
![Page 27: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/27.jpg)
Model Generation
Popular Techniques for misuse and anomaly detection: Association Rules Clustering Support Vector Machines
Supervised learning methods for Classification
Decision Trees
![Page 28: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/28.jpg)
Model build functionality: Dbms_data_mining PL/SQL package- to train linear SVM anomaly and misuse
detection models.- Test dataset
- Probing- Denial of service- Unauthorized access to a local
superuser(u2r)- Unauthorized access from a remote
machine(r2l)(37 subclasses of attacks under the 4 generic
categories)
![Page 29: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/29.jpg)
Misuse Detection Problem
Anomaly Detection Problem
Accuracy of the system 92.1%
![Page 30: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/30.jpg)
Periodic Model Updates as new data is accumulated
Model rebuild when the performance falls below a predefined level
![Page 31: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/31.jpg)
Model Distribution
Real Application Clusters (RAC)
![Page 32: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/32.jpg)
DetectionReal time / offline
Audit data are classified as attack or not by misuse detection SVM model.
![Page 33: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/33.jpg)
Functional index on the probability of a case being an attack or not
returns all cases in audit_data with probability greater than 0.5 of being an attack
![Page 34: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/34.jpg)
Combination of multiple models
The query returns all cases where either model1 or model2 indicate an attack with probability higher than 0.4:
In this case, when the anomaly_model classifies a case as an attack with probability greater than 0.5, the misuse_model will attempt to identify the type of attack:
![Page 35: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/35.jpg)
Reports and Analysis
![Page 36: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/36.jpg)
![Page 37: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/37.jpg)
![Page 38: slides](https://reader033.vdocuments.us/reader033/viewer/2022051608/5457cb75af795946138b73ae/html5/thumbnails/38.jpg)
Conclusion
Data mining techniques are very useful in Intrusion Detection Still need manually interpretation/advice in some processing steps More efficient on known attacks than on
unknown attacks only if the training data contains all normal behavior