anomaly detection in a mobile communication networkdddas/papers/pawling_slides.pdfdata streams...
TRANSCRIPT
Anomaly Detection in a MobileCommunication Network
Alec Pawling, Nitesh V. Chawla, and Greg Madey
University of Notre Dame
Anomaly Detection in a Mobile Communication Network. – p.1
Overview
We present a technique that uses hybrid clustering in con-
junction with statistical process control to handle concept drift
in a data stream.
Anomaly Detection in a Mobile Communication Network. – p.2
Outline
• Motivation• Background
◦ Data streams◦ Concept drift◦ Statistical process control
• Related work• Hybrid clustering for streams• Setup• Results• Conclusion
Anomaly Detection in a Mobile Communication Network. – p.3
Motivation
Application• Detection and Alert System component of WIPER
Emergency Response System [Schoenharl etal., 2006], [Madey, et al., 2006]◦ Detect and report anomalies in network usage◦ Notify Simulation and Prediction System
Difficulties• Massive volume of data• Dynamic system
Anomaly Detection in a Mobile Communication Network. – p.4
Data Streams
• Data can only be read once (due to volume)• Order of data cannot be manipulated• Often, if the underlying process is stationary, anomaly
detection is straightforward• If the underlying process is dynamic, the problem is
difficult
Anomaly Detection in a Mobile Communication Network. – p.5
Concept Drift
• Change in process that generates the data streamover time
• May or may not be periodic
Anomaly Detection in a Mobile Communication Network. – p.6
Concept Drift
0
100
200
300
400
500
600
700
0 1440 2880 4320 5760 7200 8640 10080 11520 12960 14400 15840 17280 18720
Num
ber
of in
stan
ces
Time (min)
Figure 1: GPRS usage over 12 days
Anomaly Detection in a Mobile Communication Network. – p.7
Statistical Process Control
Distinguish between random and assignable variation:threshold is µ ± lω.• Random variation
◦ High probability, little effect on process output• Assignable variation
◦ Low probability, significant effect on process output◦ Change in underlying process
Anomaly Detection in a Mobile Communication Network. – p.8
Statistical Process Control
0
200000
400000
600000
800000
1e+06
1.2e+06
1.4e+06
1.6e+06
0 1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Tel
epho
ne c
all v
olum
e
Hour of the day
Figure 2: Range of random variance
Anomaly Detection in a Mobile Communication Network. – p.9
Related Work
Intrusion detection (Portnoy, 2001)• Identify intrusions in an unlabeled data set using
leader clustering.• The leader algorithm (Hartigan 1975)
◦ Let d be a distance threshold.◦ Let the first instance assigned to cluster Ci be the
defining instance, ci
◦ For each instance x
• Find the closest cluster, Cj
• If dist(x, ci) < d, add c to Ci• Otherwise, create a new cluster with the defining
instance x.
Anomaly Detection in a Mobile Communication Network. – p.10
Related Work
Problem:• Uses z-score normalization to allow for arbitrary data
distribution:
v′i =vi − v̄i
σi
• This is not possible in one pass
Anomaly Detection in a Mobile Communication Network. – p.11
Related Work
Hybrid clustering algorithms, (Cheu et al., 2004)
1. Cluster to reduce the data set
2. Produce final clusters
Anomaly Detection in a Mobile Communication Network. – p.12
Hybrid Algorithm for Streams
1. Establish clusters with some minimum number ofinstances using a partitional or hierarchical algorithm
2. Incrementally update cluster center and standarddeviations using a variation on the leader algorithm.
Anomaly Detection in a Mobile Communication Network. – p.13
Setup
Data set• Feature vector consists of timestamp and number of
instances of 5 services• One example for each minute of a 12 day period
(18721 examples)
Clustering Algorithms• Expectation Maximization — Weka, cross-validation to
determine number of clusters• Leader• Hybrid for streams: (1) k-means, (2) modified leader
Anomaly Detection in a Mobile Communication Network. – p.14
Results
Hybrid algorithm• Small clusters compared to EM• Little consistency in detected outliers among different
thresholds or values of k
Leader algorithm• More consistency in anomaly detection
Anomaly Detection in a Mobile Communication Network. – p.15
Conclusion
• Algorithms using random values may be a bad idea• Algorithms requiring only threshold parameter seem
promising
Future work• Hierarchical clustering to establish clusters• Examine further how the number of clusters grows
over time
Anomaly Detection in a Mobile Communication Network. – p.16
Acknowledgments
This work is supported, in part, by the National Sci-
ence Foundation, the DDDAS Program, under Grant No.
CNS_050348.
Anomaly Detection in a Mobile Communication Network. – p.17