introduction to data miningukang/courses/20s-dm/l... · anomaly detection applications network...
TRANSCRIPT
![Page 1: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/1.jpg)
U Kang
Introduction to Data Mining
Anomaly Detection
U KangSeoul National Univeristy
![Page 2: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/2.jpg)
U Kang
In This Lecture
Motivation of anomaly detection
Graph structure based method
Random walk based method
![Page 3: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/3.jpg)
U Kang
Outline
Overview
Graph Structure Based Method
Random Walk Based Method
![Page 4: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/4.jpg)
U Kang
Data Mining
Data mining: find patterns and anomalies
To spot anomalies, we have to discover patterns
![Page 5: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/5.jpg)
U Kang
Data Mining
Data mining: find patterns and anomalies
To spot anomalies, we have to discover patterns
Large datasets reveal patterns/anomalies that may be invisible otherwise…
![Page 6: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/6.jpg)
U Kang
Anomaly Detection
Anomaly detection
Find suspicious data points which deviate significantly from normal data
Anomaly detection in graph
Find “strange” node in graph
![Page 7: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/7.jpg)
U Kang
Anomaly Detection
Applications
Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.)
Call network : find heavy telemarketer
Social network : spot people adding friends indiscriminately in “popularity contest”
Credit card fraud
(the list continues..)
![Page 8: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/8.jpg)
U Kang
Anomaly Detection
More Applications
Campaign donation irregularity
Extremely cross-disciplinary authors in an author-paper graph
Electronic auction fraud
![Page 9: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/9.jpg)
U Kang
Plan
We will look at two methods for anomaly detection in graphs
Graph Structure Based Method
Random Walk Based Method
![Page 10: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/10.jpg)
U Kang
Outline
Overview
Graph Structure Based Method
Random Walk Based Method
L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted
Graphs. PAKDD, 2012
![Page 11: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/11.jpg)
U Kang
Problem Definition
Given: a weighted and unlabeled graph,
Q1: how can we spot strange, abnormal, extreme nodes?
Q2 : how can we explain why the spotted nodes are anomalous?
![Page 12: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/12.jpg)
U Kang
OddBall: approach
For each node
Extract “ego-net” (=1 step neighborhood)
Extract features (#edges, total weight, etc.)
Features that could yield “laws”
Features fast to compute and interpret
Detect patterns
Regularities
Detect anomalies
Deviate significantly
from patterns
![Page 13: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/13.jpg)
U Kang
What is Odd?
![Page 14: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/14.jpg)
U Kang
Main Idea
For each egonet, extract features
Find “rules” in features
Anomalies deviate significantly from the rules
![Page 15: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/15.jpg)
U Kang
Which Features?
Ni : # of neighbors (degree) of ego i
Ei : # of edges in egonet i
Wi : total weight of egonet i
λw,i : principal eigenvalue of the weighted adjacency matrix of egonet i
![Page 16: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/16.jpg)
U Kang
Why Principal Eigenvalue?
![Page 17: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/17.jpg)
U Kang
OddBall: pattern #1
![Page 18: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/18.jpg)
U Kang
OddBall: pattern #2
![Page 19: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/19.jpg)
U Kang
OddBall: pattern #3
![Page 20: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/20.jpg)
U Kang
OddBall: anomaly detection
(e.g. LOF)
![Page 21: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/21.jpg)
U Kang
OddBall: datasets
![Page 22: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/22.jpg)
U Kang
OddBall at work (Posts)
![Page 23: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/23.jpg)
U Kang
OddBall at work (FEC)
![Page 24: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/24.jpg)
U Kang
OddBall at work (DBLP)
![Page 25: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/25.jpg)
U Kang
Outline
Overview
Graph Structure Based Method
Random Walk Based Method
J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly
detection in bipartite graphs. ICDM, 2005
![Page 26: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/26.jpg)
U Kang
Anomalies in Bipartite Graphs
![Page 27: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/27.jpg)
U Kang
Examples of Bipartite Graphs
Publication network
Author-paper
P2P network
User-file
Recommendation
User-product
Stock market
Stock-trader
![Page 28: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/28.jpg)
U Kang
1) Neighborhood Formulation
Main idea
Compute the Random Walk with Restart score from query node q
Steady state probability = relevance
![Page 29: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/29.jpg)
U Kang
1) Neighborhood Formulation
Exact Neighborhood Formulation (NF)
Exact RWR score
Approximate NF
Partition the original graph into pieces by METIS
Compute similarities only on the partition containing the query node
![Page 30: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/30.jpg)
U Kang
2) Anomaly Detection
Main idea: to compute anomaly score of t
Compute pairwise “relevance” scores for the neighbors of t
Compute mean of the relevance scores
![Page 31: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/31.jpg)
U Kang
Experiment
Dataset:
DBLP Conf-Auth
DBLP Author-Paper
IMDB movie-actor
Questions:
Q1) What are the discoveries?
Q2) Anomaly detection quality?
![Page 32: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/32.jpg)
U Kang
1) NF discovery
![Page 33: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/33.jpg)
U Kang
2) Anomaly Detection Quality
Setting: injected 100 random nodes connecting high degree nodes
![Page 34: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/34.jpg)
U Kang
What You Need to Know
Anomaly detection
Find suspicious data points which deviate significantly from normal data
Anomaly detection in graphs
Graph Structure Based Method
Random Walk Based Method
Neighborhood Formulation (NF)
Anomaly detection using NF
![Page 35: Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.) Call](https://reader034.vdocuments.us/reader034/viewer/2022042606/5fb27aeff75538788a62ee99/html5/thumbnails/35.jpg)
U Kang
Questions?