A Data Mining Approach for Building Cost-Sensitive and Light Intrusion
Detection Models
PI Meeting - July, 2000
North Carolina State UniversityColumbia University
Florida Institute of Technology
Overview
• Project description• Progress report:
– correlation– cost-sensitive modeling– anomaly detection– collaboration with industry
• Plan of work for 2000-2001
New Ideas/Hypotheses• High-volume automated attacks can overwhelm an
IDS and its staff.• Use cost-sensitive data mining algorithms to
construct ID models that consider cost factors:– damage cost, response cost, operational cost, etc.
• Multiple specialized and light ID models can be dynamically activated/configured in run-time
• Cost-effectiveness as the guiding principle and multi-model correlation as the architectural approach .
Impact• A better understanding of the cost factors, cost
models, and cost metrics related to intrusion detection.
• Modeling techniques and deployment strategies for cost-effective IDSs.
• “Clustering” techniques for grouping intrusions and building specialized and light models.
• An architecture for dynamically activating, configuring, and correlating ID models.
Correlation: Model and Issues
• “Good” base models: data sources and modeling techniques.
• The combined model: the correlation algorithms and network topology.
across sources
across time/sources
Correlation: Approaches
• Extend previous work in JAM • A sequence of time-stamped records
– each is composed of signals from multiple sensors (network topology information embedded);
• Apply data mining techniques to learn how to correlate the signals to generate a combined sensor: – link analysis, sequence analysis, machine learning
(classification), etc.
Correlation: Integrating NM and ID Signals
• A stream of measures (anomaly reports) on MIB variables of network elements and a stream of ID signals:– Better coverage;– Early sensing of attacks.
• Normal measures of network traffics and parameter values of ID signatures– S = f(N, A), A is invariant then S=g(N).– Automatic parameter adjustment, S1=g(N1).
Cost Factors of IDSs• Attack taxonomy: result/target/technique• Development cost• Damage cost (DCost)
– The amount of damage when ID is not available or ineffective.
• Response cost (RCost)– The cost of acting upon an alarm of potential intrusion.
• Operational cost (OpCost)– The cost of processing and analyzing audit data ;– Mainly the computational costs of the features.
Cost Models of IDSs
• The total cost of an IDS over a set of events:
• CumulativeCost(E) = eE (CCost(e) + OpCost(e))
• CCost(e), the consequential cost, depends on prediction on event e
Consequential Cost (CCost)
• For event e :
Outcome CCost(e) Conditions Miss (FN) DCost(e) False Alarm (FP) RCost(e’)+PCost(e) DCost(e’) RCost(e’) 0 Otherwise Hit (TP) RCost(e)+DCost(e) DCost(e) RCost(e) DCost(e) Otherwise Normal (TN) 0 Misclassified Hit RCost(e’)+DCost(e) DCost(e’) RCost(e’) DCost(e) Otherwise
Cost-sensitive Modeling: Objectives
• Reducing operational costs:– Use cheap features in ID models.
• Reducing consequential costs:– Do not respond to an intrusion if RCost >
DCost.
Cost-sensitive Modeling: Approaches
• Reducing operational costs:– A multiple-model approach:
• Build multiple rule-sets, each with features of different cost levels;
• Use cheaper rule-sets first, costlier ones later only for required accuracy.
– Feature-Cost-Sensitive Rule Induction:• Search heuristic considers information gain AND
feature cost.
Cost-sensitive Modeling: Approaches (continued)
• Reducing consequential costs:– MetaCost:
• Purposely re-label intrusions with Rcost > DCost as normal.
– Post-Detection decision: • Action depends on comparison of RCost and DCost.
Latest Results• OpCost
– Compare the multiple-model approach with single-model approach;
– rdc%: (single - multiple)/single;– range: 57% to 79%.
0
50
100
150
200
250
Average Per Connection
SingleMultiple
Latest Results (continued)
• CCost using a post-detection cost-sensitive decision module– rdc% range: 75% to 95%;– Compared with single model: slightly better rdc%;– Compared with cost-insensitive models: 25% higher rdc%.
23000235002400024500250002550026000265002700027500
Total Ccost
CS-singleCS-multipleCI-singleCI-Multiple
Anomaly Detection
• Unsupervised Training Methods– Build models over noisy (not clean) data
• Artificial Anomalies– Improves performance of anomaly detection
methods.• Combining misuse and anomaly detection.
AD over Noisy Data
• Builds normal models over data containing some anomalies.
• Motivating Assumptions: – Intrusions are extremely rare compared to to
normal.– Intrusions are quantitatively different.
Approach Overview
• Mixture Model– Normal Component– Anomalous Component
• Build Probabilistic Model of Data• Max Likelihood test for detection.
Mixture Model of Anomalies
• Assume a generative model: The data is generated with a probability distribution D.
• Each element originates from one of two components.– M, the Majority Distribution (x M).– A, the Anomalous Distribution (x A).
• Thus: D = (1-)M + A
Modeling Probability Distributions
• Train Probability Distributions over current sets of M and A.
• PM(X) = probability distribution for Majority
• PA(X) = probability distribution for Anomaly
• Any probability modeling method can be used: Naïve Bayes, Max Entropy, etc.
• Likelihood of a partition of the set of all elements D into M and A:
L(D)= PD(X)
=((1-)|M| PM(X) )( |A| PA(X))
• Log Likelihood (for computational reasons): LL(D)=log(L(D))
Detecting Anomalies
M A
D
Algorithm for Detection
• Assume all elements are normal (M0=D, A0= ).
• Compute PD(X).
• Using PD(X) compute LL(D).• For each element compute difference in LL(D)
if removed from M and inserted into A. • If the difference is large enough, then declare
the element an anomaly.
Evaluating xt
Mt+1 = Mt – {xt}
At+1 = At U {xt}
Recompute PMt and PAt. (efficiently)
If (LLt+1-LLt)> threshold, xt is anomaly
Otherwise xt is normal
Experiments
• Two Sets of experiments:– Measured Performance against comparison
methods over noisy data.– Measured Performance trained over noisy data
against comparison methods trained over clean data.
AD Using Artificial Anomalies• Generate abnormal behavior artificially
– assume the given normal data are representative– “near misses” of normal behavior is considered abnormal– change the value of only one feature in an instance of
normal behavior– sparsely represented values are sampled more frequently– “near misses” help define a tight boundary enclosing the
normal behavior
Experimental Results• Learning algorithm: RIPPER rule learner.• Data: 1998/99 DARPA evaluation
– U2R, R2L, DOS, PRB: 22 “clusters”• Training data: normal and artificial anomalies• Results
– Overall hit rate: 94.26% (correctly normal or intrusion)– Overall false alarm rate: 2.02%– 100% dectection: buffer_overflow, guess_passwd, phf, back– 0% detection: perl, spy, teardrop, ipsweep, nmap– 50+% detection: 13 out of 22 intrusion subclasses
Combining Anomaly And Misuse Detection
• Training data: normal, artificially generated anomalies, known intrusions
• The learned model can predict normal, anomaly, or known intrusion subclass
• Experiments were performed on increasing subsets of known intrusion subclasses in the training data (simulates identified intrusions over time).
Combining Anomaly And Misuse Detection (continued)
• Consider phf, pod, teardrop, spy, and smurf are unknown (absent from the training data)
• Anomaly detection rate: phf=25%, pod=100%, teardrop=93.91%, spy=50%, smurf=100%
• Overall false alarm rate: .20%• The false alarm rate has dropped from 2.02% to
.20% when some known attacks are included for training
Collaboration with Industry• RST Inc.
– Anomaly detection on NT systems
• NFR Inc.– real-time IDS
• SAS Institute– off-line ID (funded by SAS)
• Aprisma (Cabletron)– Integrating ID with NM (funded by Aprisma)
• HRL Labs– ID in wireless networks (funded by HRL)
Plan for 2000-2001• Dynamic cost-sensitive modeling and
deployment– work with industry for realistic cost analysis and
real-time testing• Anomaly detection
– improve existing algorithms using feedback from evaluation
• Correlation– develop/evaluate algorithms for integrating multiple
sources data/evidences