causal modeling for anomaly detection
DESCRIPTION
Causal Modeling for Anomaly Detection. Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling Group, IBM Rick Lawrence, Manager June 23, 2006. Contributions. - PowerPoint PPT PresentationTRANSCRIPT
Causal Modeling for Anomaly Detection
Andrew ArnoldMachine Learning Department, Carnegie Mellon University
Summer Project with Naoki AbePredictive Modeling Group, IBM
Rick Lawrence, ManagerJune 23, 2006
2
Contributions• Consistent causal structure can be learned
from passive observational data• Anomalous examples have a quantitatively
differentiable causal structure from normal ones
• Causal structure is a significant contribution to the standard analysis tools of independence and likelihood
3
Outline
• Motivation & Problem
• Causation Definition
• Causal Discovery
• Causal Comparisson
• Conclusions & Ongoing Work
4
Motivation• Processors:
– Detection: Is this wafer good or bad?
– Causation: Why is this wafer bad?
– Intervention: How can we fix the problem?
• Business:– Detection: Is this business functioning well or not?
– Causation: Why is this business not functioning well?
– Intervention: What can IBM do to improve performance?
5
Problem
• Interventions are expensive and flawed
• What can passively observed data tell us about the causal structure of a process?
6
Direct Causation
X is a direct cause of Y relative to S, iff
z,x1 x2 P(Y | X set= x1 , Z set= z)
P(Y | X set= x2 , Z set= z)
where Z = S - {X,Y} X Y
[Scheines (2005)]
Asymmetric
Intervene toset Z = zNot just
observe Z = z
7
Causal Graphs
Causal Directed Acyclic Graph G = {V,E}
Each edge X Y represents a direct causal claim:
X is a direct cause of Y relative to V
Exposure Infection Symptoms
[Scheines (2005)]
8
Probabilistic Independence
X and Y are independent iff
x1 x2 P(Y | X = x1) = P(Y | X = x2)
X Y
X Y
X and Y are associated iff
X and Y are not independent
[Scheines (2005)]
9
Causal Structure
Probabilistic Independence
The Causal Markov Axiom
Markov Condition
In a Causal Graph: each variable V is independent of its non-effects, conditional on its direct causes.
[Scheines (2005)]
10
Causal Structure Statistical Data
[Scheines (2005)]
11
Causal Structure Statistical Data
[Scheines (2005)]
12
Causal Structure Statistical Data
X3 | X2 X1
X2 X3 X1
Causal Markov Axiom(D-separation)
IndependenceRelations
Causal Graph
[Scheines (2005)]
13
Causal Discovery
Statistical Data Causal Structure
Background Knowledge
- Faithfulness
- X2 before X3
- no unmeasured common causes
X3 | X2 X1
Independence Relations
Data
Statistical Inference
X2 X3 X1
Equivalence Class of Causal Graphs
X2 X3 X1
X2 X3 X1
Discovery Algorithm
Causal Markov Axiom (D-separation)
X2 X3 X1
Equivalence Class Representation
[Scheines (2005)]
14
Causal Discovery Algorithm
• PC algorithm [Spirtes et al., 2000]– Constraint-based search– Only need to know how to test conditional
independence– Do not need to measure all causes– Asymptotically correct
15
PC algorithm
• Begin with the fully connected undirected graph
• For each pair of nodes, test their independence conditional on all subsets of their neighbors:– i.e., (X _||_ Y | Z)?
• If independent for any conditioning– remove edge, record subset conditioned upon
• If dependent for all conditionings– leave edge
• Orient edges, where possible
16
Independence Tests
[Scheines (2005)]
17
Edge OrientationRule 1: Colliders
[Scheines (2005)]
18
More Orientation Rules:Rule 2: Avoid forming new colliders
[Scheines (2005)]
19
More Orientation Rules:Rule 3: Avoid forming cycles
If there is an undirected edge between X and YAnd there is a directed path from X to Y
– Then direct X-Y as X Y
Given: OK: BAD (cycle): X Y X Y X Y
Z Z Z
20
Our Example
Rule 2: Colliders
Rule 3: No new V-structures
Truth fully recovered
[Scheines (2005)]
23
Results: Key Performance Indicators
26
Results: Chip Fabrication
27
Temporal ordering is preserved
28
Using causal structure to explain anomalies
• Why is one wafer good, and another bad?– Separate data into classes– Form causal graphs on each class– Compare causal structures
30
Form causal graphs
Good Train
Good Test
Bad
31
How to compare?• Similarity Score for graphs A and B over common
nodes V :– Consider undirected edges as bi-directed– Of all the ordered pairs of variables (x, y) in V, with an
arc x y in either A or B• In what percentage is there also x y in the other graph• i.e., (AdjA(x,y) || AdjB(x,y)) && (AdjA(x,y) == AdjB(x,y))
• Difference Graph:– If there is an arc x y in either A or B, but not in both,
place the arc x y in the difference graph – i.e., if (AdjA(x,y) != AdjB(x,y)) then AdjDiff(x,y) = True
32
ComparisonGood TestGood Train
59% similar Difference Graph
33
ComparisonBadGood Train
37% similar Difference Graph
34
ComparisonBadGood Test
35% similar Difference Graph
35
Conclusions• Consistent causal structure can be learned
from passive observational data• Anomalous examples have a quantitatively
differentiable causal structure from normal ones
• Causal structure is a significant contribution to the standard analysis tools of independence and likelihood
36
Ongoing work
• Comparing to maximum likelihood and minimum description length techniques
• Looking at time-ordering– How do variables influence each other over time?
• Using one-class SVM to do clustering– Avoids need for labeled data
• Relaxing assumptions– Allow latent variables
• Evaluation is difficult without domain expert• Using causal structure to help in clustering
37
References• J. Pearl (2000). Causality: Models, Reasoning, and Inference,
Cambridge Univ. Press • R. Scheines, Causality Slides
http://www.gatsby.ucl.ac.uk/~zoubin/SALD/scheines.pdf• P. Spirtes, C. Glymour, and R. Scheines (2000). Causation,
Prediction, and Search, 2nd Edition (MIT Press)
Thank You
¿ Questions ?