Decision Tree Classifier for Signature Recognition and State Classification
in Intrusion Detection
IEE591C Presentation
Xiangyang Li, Qiang Chen and Yebin Zhang
Information Integration and Assurance Laboratory Arizona State University
Box 875906, Tempe, AZ 85287-5906, USA
September 2000 2
Problem Definition(1)
• Intrusion Detection
Normality profile method
Signature recognition method– Decision tree technique can be used
to build the signatures of normal activities and attacks automatically. Each path of the tree corresponds to a signature.
– Each leaf represents an IW value. Each leaf corresponds to a specific state of the system.
September 2000 3
Problem Definition(2)
• BSM audit event from Solaris event 217
auid -2
euid 0
egid 0
ruid 0
rgid 0
pid 96
sid 0
RemoteIP 0.0.0.0
time 897047263
error_message 91
process_error 0
retval 0
attack 0
• Target variable– Label : 0 - normal activity, 1 - attack
– IW(Intrusion Warning) : 0 - 1
• Predictor variables
Only use the information of event type. (284 event types - Solaris 2.7)
• Data sets– Training data set
– Testing data set
September 2000 4
Problem Definition(3)
• Decision tree algorithms– GINI and CHAID (Answer Tree - SPSS Inc.)– Information Gain Ratio (ITI - UMASS)
• Analysis of testing results– Comparison of Mean, Max and Min of IW values between normal and
attack events.– ROC (Receiver Operating Curve) with Hit rates and False alarm rates
based on the predicted IW values and the true Label values.
September 2000 5
Single-event Decision Tree Classifier
• Single-event classifier– Label -> target variable
– Event type -> the only predictor variable
September 2000 6
Result Analysis(1)
IWValue
Min Max Mean StandardDeviation
Normal 0.00 1.00 0.209 0.135
Attack 0.00 1.00 0.368 0.255
Statistics for single event classifier (CHAID)
IWValue
Min Max Mean StandardDeviation
Normal 0.00 1.00 0.217 0.1579
Attack 0.00 1.00 0.396 0.2921
Statistics for single event classifier (ITI)
September 2000 7
Result Analysis(2)
ROC for single event classifier(ITI)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Fasle alarm rate
Hit
rate
September 2000 8
Result Analysis(3)
ROC analysis for single event classifier (CHAID)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
False alarm rate
Hit
rat
e
September 2000 9
EWMA VectorsWe use one variable to represent one event type. Then there are 284 variables for the 284 event types. In our sample data set there are 50 variables. Then we use these variables as the predictor variables. Each variable is calculated for each event as:
)1(*)1(1*)( tXtX ii if the audit event at time t belongs to the ith event type
)1(*)1(0*)( tXtX ii if the audit event at time t is different from the ith event type
3.0,0)0( iX
September 2000 10
Result Analysis(4)IW
ValueMin Max Mean Standard
DeviationNormal 0.00 1.00 0.209 0.135
Attack 0.00 1.00 0.368 0.255
Statistics for single event classifier (CHAID)
IWValue
Min Max Mean StandardDeviation
Normal 0.00 1.00 0.046 0.210
Attack 0.00 1.00 0.881 0.324
Statistics for EWMA vector classifier (CHAID)
September 2000 11
Result Analysis(5)
ROC analysis for EWMA vectors (GINI-CHAID)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
False alarm rate
Hit
ra
te
GINI
CHAID
September 2000 12
Moving Window
Moving Direction
Observation Window
E2 E3 E7 E6 E3 E4 E16 E2
Window Size = 4 units
New datavariables {E1… E2 E3 E4 E5 E6 E7…E284}
values {… 0 1 1 0 1 1 …}
September 2000 13
“Existence” and “Count” Classifiers
• “Existence”
In the transferred data set, variable i records whether event type i exists in current moving window. We use this one in moving window classifiers on event types.
• “Count”
In the transferred data set, variable i records how many times event type i appears in current moving window.
• Truncation
Remove the part of transferred data which includes both normal and attack
events.
September 2000 14
Result Analysis(6)
IWValue
Min Max Mean StandardDeviation
Normal 0.00 1.00 0.065 0.246
Attack 0.00 1.00 0.917 0.277
Statistics for moving window classifier (CHAID-GINI)
IWValue
Min Max Mean StandardDeviation
Normal 0.00 1.00 0.046 0.210
Attack 0.00 1.00 0.881 0.324
Statistics for EWMA vector classifier (CHAID)
September 2000 15
Result Analysis(7)
ROC for moving window classifier (ITI-CHAID-GINI)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
False alarm rate
Hit
rat
e
September 2000 16
Tree Structure for Moving Window Classifier (CHAID-GINI-ITI)
September 2000 17
Layered Classifiers
Single event classifier
Auditdata
Upper Level
Lower Level
IW
State-ID classifier
IW
Classified States
State-ID Classifiers
September 2000 18
Result Analysis(8)
IWValue
Min Max Mean StandardDeviation
Normal 0.00 1.00 0.033 0.0826
Attack 0.00 1.00 0.901 0.2706
Statistics for “existence” state-ID classifier (ITI)
IWValue
Min Max Mean StandardDeviation
Normal 0.00 1.00 0.018 0.0812
Attack 0.00 1.00 0.924 0.2548
Statistics for “count” state-ID classifier (ITI)
September 2000 19
Result Analysis(9)
ROC analysis for state_ID classifiers (CHAID)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
False alarm rate
Hit
rat
e
Count
Existence
September 2000 20
Result Analysis(10)
ROC analysis for "count" state-ID classifiers
0.8
0.85
0.9
0.95
1
0 0.2 0.4 0.6 0.8 1
False alarm rate
Hit
rat
e Chaid
Gini
ITI
September 2000 21
Results Analysis(11)
Comparison of ROC curves (ITI)
0.8
0.85
0.9
0.95
1
0 0.2 0.4 0.6 0.8 1
False alarm rate
Hit
rat
e moving window
"existence" state-ID classifier "count" state-IDclassifier
September 2000 22
Conclusions and Problem
Conclusions
• DTCs show promising performance in intrusion detection application
• The performance of a DTC is dependent on its design, i.e. the choice of predictor variables and target variable.
• Different decision tree algorithms impact the results.
Problem
• Computational Feasibility
– Incremental training ability(ITI)
– Scalable/Parallel/Database(ScalParC)
– Bagging and Boosting?
September 2000 23
END
• This presentation - http://iia.eas.asu.edu/myweb/courses/dtc.ppt
• Other works - http://iia.eas.asu.edu/