causal modeling for anomaly detection

32
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling Group, IBM Rick Lawrence, Manager June 23, 2006

Upload: marcia-anderson

Post on 30-Dec-2015

51 views

Category:

Documents


4 download

DESCRIPTION

Causal Modeling for Anomaly Detection. Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling Group, IBM Rick Lawrence, Manager June 23, 2006. Contributions. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Causal Modeling for  Anomaly Detection

Causal Modeling for Anomaly Detection

Andrew ArnoldMachine Learning Department, Carnegie Mellon University

Summer Project with Naoki AbePredictive Modeling Group, IBM

Rick Lawrence, ManagerJune 23, 2006

Page 2: Causal Modeling for  Anomaly Detection

2

Contributions• Consistent causal structure can be learned

from passive observational data• Anomalous examples have a quantitatively

differentiable causal structure from normal ones

• Causal structure is a significant contribution to the standard analysis tools of independence and likelihood

Page 3: Causal Modeling for  Anomaly Detection

3

Outline

• Motivation & Problem

• Causation Definition

• Causal Discovery

• Causal Comparisson

• Conclusions & Ongoing Work

Page 4: Causal Modeling for  Anomaly Detection

4

Motivation• Processors:

– Detection: Is this wafer good or bad?

– Causation: Why is this wafer bad?

– Intervention: How can we fix the problem?

• Business:– Detection: Is this business functioning well or not?

– Causation: Why is this business not functioning well?

– Intervention: What can IBM do to improve performance?

Page 5: Causal Modeling for  Anomaly Detection

5

Problem

• Interventions are expensive and flawed

• What can passively observed data tell us about the causal structure of a process?

Page 6: Causal Modeling for  Anomaly Detection

6

Direct Causation

X is a direct cause of Y relative to S, iff

z,x1 x2 P(Y | X set= x1 , Z set= z)

P(Y | X set= x2 , Z set= z)

where Z = S - {X,Y} X Y

[Scheines (2005)]

Asymmetric

Intervene toset Z = zNot just

observe Z = z

Page 7: Causal Modeling for  Anomaly Detection

7

Causal Graphs

Causal Directed Acyclic Graph G = {V,E}

Each edge X Y represents a direct causal claim:

X is a direct cause of Y relative to V

Exposure Infection Symptoms

[Scheines (2005)]

Page 8: Causal Modeling for  Anomaly Detection

8

Probabilistic Independence

X and Y are independent iff

x1 x2 P(Y | X = x1) = P(Y | X = x2)

X Y

X Y

X and Y are associated iff

X and Y are not independent

[Scheines (2005)]

Page 9: Causal Modeling for  Anomaly Detection

9

Causal Structure

Probabilistic Independence

The Causal Markov Axiom

Markov Condition

In a Causal Graph: each variable V is independent of its non-effects, conditional on its direct causes.

[Scheines (2005)]

Page 10: Causal Modeling for  Anomaly Detection

10

Causal Structure Statistical Data

[Scheines (2005)]

Page 11: Causal Modeling for  Anomaly Detection

11

Causal Structure Statistical Data

[Scheines (2005)]

Page 12: Causal Modeling for  Anomaly Detection

12

Causal Structure Statistical Data

X3 | X2 X1

X2 X3 X1

Causal Markov Axiom(D-separation)

IndependenceRelations

Causal Graph

[Scheines (2005)]

Page 13: Causal Modeling for  Anomaly Detection

13

Causal Discovery

Statistical Data Causal Structure

Background Knowledge

- Faithfulness

- X2 before X3

- no unmeasured common causes

X3 | X2 X1

Independence Relations

Data

Statistical Inference

X2 X3 X1

Equivalence Class of Causal Graphs

X2 X3 X1

X2 X3 X1

Discovery Algorithm

Causal Markov Axiom (D-separation)

X2 X3 X1

Equivalence Class Representation

[Scheines (2005)]

Page 14: Causal Modeling for  Anomaly Detection

14

Causal Discovery Algorithm

• PC algorithm [Spirtes et al., 2000]– Constraint-based search– Only need to know how to test conditional

independence– Do not need to measure all causes– Asymptotically correct

Page 15: Causal Modeling for  Anomaly Detection

15

PC algorithm

• Begin with the fully connected undirected graph

• For each pair of nodes, test their independence conditional on all subsets of their neighbors:– i.e., (X _||_ Y | Z)?

• If independent for any conditioning– remove edge, record subset conditioned upon

• If dependent for all conditionings– leave edge

• Orient edges, where possible

Page 16: Causal Modeling for  Anomaly Detection

16

Independence Tests

[Scheines (2005)]

Page 17: Causal Modeling for  Anomaly Detection

17

Edge OrientationRule 1: Colliders

[Scheines (2005)]

Page 18: Causal Modeling for  Anomaly Detection

18

More Orientation Rules:Rule 2: Avoid forming new colliders

[Scheines (2005)]

Page 19: Causal Modeling for  Anomaly Detection

19

More Orientation Rules:Rule 3: Avoid forming cycles

If there is an undirected edge between X and YAnd there is a directed path from X to Y

– Then direct X-Y as X Y

Given: OK: BAD (cycle): X Y X Y X Y

Z Z Z

Page 20: Causal Modeling for  Anomaly Detection

20

Our Example

Rule 2: Colliders

Rule 3: No new V-structures

Truth fully recovered

[Scheines (2005)]

Page 21: Causal Modeling for  Anomaly Detection

23

Results: Key Performance Indicators

Page 22: Causal Modeling for  Anomaly Detection

26

Results: Chip Fabrication

Page 23: Causal Modeling for  Anomaly Detection

27

Temporal ordering is preserved

Page 24: Causal Modeling for  Anomaly Detection

28

Using causal structure to explain anomalies

• Why is one wafer good, and another bad?– Separate data into classes– Form causal graphs on each class– Compare causal structures

Page 25: Causal Modeling for  Anomaly Detection

30

Form causal graphs

Good Train

Good Test

Bad

Page 26: Causal Modeling for  Anomaly Detection

31

How to compare?• Similarity Score for graphs A and B over common

nodes V :– Consider undirected edges as bi-directed– Of all the ordered pairs of variables (x, y) in V, with an

arc x y in either A or B• In what percentage is there also x y in the other graph• i.e., (AdjA(x,y) || AdjB(x,y)) && (AdjA(x,y) == AdjB(x,y))

• Difference Graph:– If there is an arc x y in either A or B, but not in both,

place the arc x y in the difference graph – i.e., if (AdjA(x,y) != AdjB(x,y)) then AdjDiff(x,y) = True

Page 27: Causal Modeling for  Anomaly Detection

32

ComparisonGood TestGood Train

59% similar Difference Graph

Page 28: Causal Modeling for  Anomaly Detection

33

ComparisonBadGood Train

37% similar Difference Graph

Page 29: Causal Modeling for  Anomaly Detection

34

ComparisonBadGood Test

35% similar Difference Graph

Page 30: Causal Modeling for  Anomaly Detection

35

Conclusions• Consistent causal structure can be learned

from passive observational data• Anomalous examples have a quantitatively

differentiable causal structure from normal ones

• Causal structure is a significant contribution to the standard analysis tools of independence and likelihood

Page 31: Causal Modeling for  Anomaly Detection

36

Ongoing work

• Comparing to maximum likelihood and minimum description length techniques

• Looking at time-ordering– How do variables influence each other over time?

• Using one-class SVM to do clustering– Avoids need for labeled data

• Relaxing assumptions– Allow latent variables

• Evaluation is difficult without domain expert• Using causal structure to help in clustering

Page 32: Causal Modeling for  Anomaly Detection

37

References• J. Pearl (2000). Causality: Models, Reasoning, and Inference,

Cambridge Univ. Press • R. Scheines, Causality Slides

http://www.gatsby.ucl.ac.uk/~zoubin/SALD/scheines.pdf• P. Spirtes, C. Glymour, and R. Scheines (2000). Causation,

Prediction, and Search, 2nd Edition (MIT Press)

Thank You

¿ Questions ?