course files

41
Course files http:// www.andrew.cmu.edu /~ ddanks /NASSLLI/

Upload: edda

Post on 06-Feb-2016

21 views

Category:

Documents


1 download

DESCRIPTION

Course files. http://www.andrew.cmu.edu/~ddanks/NASSLLI/. Principles Underlying Causal Search Algorithms. Fundamental problem. As we have all heard many times… “ Correlation is not causation! ”. Fundamental problem. Why is this slogan correct? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Course files

Course files

http://www.andrew.cmu.edu/~ddanks/NASSLLI/

Page 2: Course files

Principles Underlying Causal Search Algorithms

Page 3: Course files

Fundamental problem

As we have all heard many times…

“Correlation is not causation!”

Page 4: Course files

Fundamental problem

Why is this slogan correct? Causal hypotheses make implicit claims

about the effects of intervening (manipulating) one or more variables

Hypotheses about association or correlation make no such claims Correlation or probabilistic dependence can be

produced in many ways

Page 5: Course files

Fundamental problem

Some of the possible reasons why X and Y might be associated are: Sheer chance X causes Y Y causes X Some third variable Z influences X and Y The value of X (or a cause of X) and the

value of Y (or a cause of Y) can be causes/reasons for whether an individual is in the sample (sample selection bias)

Page 6: Course files

Fundamental problem

Fundamental problem of causal search: For any particular set of data,

there are often many different causal structures that could have produced that data

Causation → Association map is many → one

Page 7: Course files

Fundamental problem

Okay, so what can we do about this? Use the data to figure out as much as

possible (though it usually won’t be everything) Requires developing search procedures

And then try to narrow the possibilities Use other knowledge (e.g., time order,

interventions) Get better / different data (e.g., run an

experiment)

Page 8: Course files

Always remember…

Even if we cannot discoverthe whole truth,

we might be able to find some of the truth!

Page 9: Course files

Markov equivalence

Formally, we say that: Two causal graphs are members of the

same Markov Equivalence Class iff they imply the exact same (un)conditional independence relations among the observed variables By the Markov and Faithfulness assumptions

Remember that d-separation gives a purely graphical criterion for determining all of the (un)conditional independencies

Page 10: Course files

Markov equivalence

The “Fundamental Problem of Causal Inference” can be restated as: For some sets of independence relations,

the Markov equivalence class is not a singleton

Markov equivalence classes give a precise characterization of what can be inferred from independencies alone

Page 11: Course files

Markov equivalence

Examples:

X {Y, Z} ⇒

X Y | Z ⇒

X Y ⇒

XY

ZX

Y

ZX

Y

Z

XY

Z

Y

ZX

Y

ZX

Page 12: Course files

Markov equivalence

Two more examples: Are these graphs Markov equivalent?

Are these two graphs?

XY

ZX

Y

Z

XY

ZX

Y

Z

Page 13: Course files

Shared structure

What is shared by all of the graphs in a Markov equivalence class? Same “skeleton”

I.e., they all have the same adjacency relations Same “unshielded colliders”

I.e., X → Y ← Z with no edge between X and Z Sometimes, other edges have same

direction In these last two cases, we can infer that the

true graph contains the shared directed edges.

Page 14: Course files

Shared structure as patterns Since every Markov equivalent graph

has the same adjacencies, we can represent the whole class using a pattern A pattern is itself a graph, but the edges

represent edges in other graphs

Page 15: Course files

Shared structure as patterns A pattern can have directed and

undirected edges It represents all graphs that can be created

by adding arrowheads to the undirected edges without creating either: (i) a cycle; or (ii) an unshielded collider

Let’s try some examples…

Page 16: Course files

Shared structure as patterns

Nitrogen — PlantGrowth — Bees

Nitrogen → PlantGrowth → Bees

Nitrogen ← PlantGrowth → Bees

Nitrogen ← PlantGrowth ← Bees

Page 17: Course files

Shared structure as patterns

Nitrogen → PlantGrowth ← Bees

Nitrogen → PlantGrowth ← Bees

Page 18: Course files

Formal problem of search

Given some dataset D, find: Markov equivalence class, represented as a

pattern P, that predicts exactly the independence relations found in the data

More colloquially, find the causal graphs that could have produced data like this

Page 19: Course files

Hard to find a pattern

“Gee, how hard could this be? Just test all of the associations, find the Markov equivalence class, then write down the pattern for it. Voila! We’re doing causal learning!”

Big problem: # of independencies to test is super-exponential in # of variables: 2 variables ⇒ 1 test 5 variables ⇒ 80

tests 3 variables ⇒ 6 tests 6 variables ⇒ 240

tests 4 variables ⇒ 24 tests and so on…

Page 20: Course files

General features of causal search Huge model and parameter spaces

Even when we (necessarily) use prior information about the family of probability distributions.

Relevant statistics must be rapidly computed But substantive knowledge about the

domain may restrict the space of alternative models Time order of variables Required cause/effect relationships Existence or non-existence of latent variables

Page 21: Course files

Three schemata for search

Bayesian / score-based Find the graph(s) with highest P(graph |

data) Constraint-based

Find the graph(s) that predict exactly the observed associations and independencies

Combined Get “close” with constraint-based, and then

find the best graph using score-based

Page 22: Course files

Bayesian / score-based

Informally: Give each model an initial score using “prior

beliefs” Update each score based on the likelihood of the

data if the model were true Output the highest-scoring model

Formally: Specify P(M, v) for all models M and possible

parameter values v of M For any data D, P(D | M, v) can easily be calculated P(M | D) ∝ ⎰v P(D | M, v)P(M, v)

Page 23: Course files

Bayesian / score-based

In practice, this strategy is completely computationally intractable There are too many graphs to check them all

So, we use a greedy search strategy Start with an initial graph Iteratively compare the current graph’s score

(∝ posterior probability) with that of each 1- or 2-step modification of that graph By edge addition, deletion or reversal

Page 24: Course files

Bayesian / score-based

Problem #1: Local maxima Often, greedy searches get stuck

Solution: Greedy search over Markov equivalence

classes, rather than graphs (Meek) Has a proof of correctness and convergence

(Chickering) But it gets to the right answer slowly

Page 25: Course files

Bayesian / score-based

Problem #2: Unobserved variables Huge number of graphs Huge number of different

parameterizations No fast, general way to compute likelihoods

from latent variable models

Partial solution: Focus on a small, “plausible” set of models

for which we can compute scores

Page 26: Course files

Constraint-based

Implementation of the earlier idea “Build” the Markov equivalence class that

predicts the pattern of association actually found in the data Compatible with a variety of statistical

techniques Note that we might have to introduce a latent

variable to explain the pattern of statistics Important constraints on search:

Minimize the number of statistical tests Minimize the size of the conditioning sets

(Why?)

Page 27: Course files

Constraint-based

Algorithm step #1: Discover the adjacencies Create the complete graph with undirected

edges Test all pairs X, Y for unconditional

independence Remove X—Y edge if they are independent

Test all adjacent X, Y for independence given single N Remove X—Y edge if they are independent

Test adjacent pairs given two neighbors …

Page 28: Course files

Constraint-based

Algorithm step #2: (Try to) Orient edges “Unshielded triple”: X — C — Y, but X, Y not

adjacent If X & Y independent given S containing C, then C

must be a non-collider Since we have to condition on it to achieve d-separation

If X & Y independent given S not containing C, then C must be a collider Since the path is not active when not conditioning on C

And then do further orientations to ensure acyclicity and nodes being non-colliders

Page 29: Course files

Constraint-based example

Variables are {X, Y, Z, W} Only independencies are:

X Y X W | Z Y W | Z

Page 30: Course files

Constraint-based example

Step 1: Form the complete graph using undirected edges

X

Y Z

W

Page 31: Course files

Constraint-based example

Step 2: For each pair of variables, remove the edge between them if they’re unconditionally independent

X Y ⇒

X

Y Z

W

Page 32: Course files

Constraint-based example

Step 3: For each adjacent pair, remove the edge if they’re independent conditional on some variable adjacent to one of them

{X, Y} W | Z ⇒

X

Y Z

W

Page 33: Course files

Constraint-based example

Step 4: Continue removing edges, checking independence conditional on 2 (or 3, or 4, or…) variables

X

Y Z

W

Page 34: Course files

Constraint-based example

Step 5: Orientation For X – Z – Y, since X Y without

conditioning on Z, then make Z a collider Since Z is a non-collider between X and W,

though, we must orient Z – W away from Z

X

Y Z

W

Page 35: Course files

Constraint-based output

Searches that allow for latent variables can also have edges of the form X o→ Y

This indicates one of three possibilities: X → Y At least one unobserved common cause of

X and Y Both of these

Page 36: Course files

Interventions to the rescue?

Interventions helped us solve an earlier equivalence class problem Randomization meant that:

Treatment-Effect association ⇒ T → E

Interventions alter equivalence classes, but don’t make them all into singletons The fundamental problem of search

remains

Page 37: Course files

Before X-intervention

XY

ZX

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

Z

XY

ZX

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

Z

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

Z

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

Z

X

X

XY

Z

Page 38: Course files

After X-intervention

XY

ZX

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

Z

XY

ZX

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

Z

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

Z

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

ZX

Y

Z

X

X

XY

Z

Page 39: Course files

Search with interventions

Search with interventions is the same as search with observations, except We adjust the graphs in the search space to

account for the intervention

For multiple experiments, we search for graphs in every output equivalence class More complicated than this in the real

world due to sampling variation

Page 40: Course files

Example

Observation Y Z | X ⇒

Intervention on X Y {X, Z} ⇒ &

Only possible graph:

XY

ZX

Y

ZX

Y

Z

XY

Z

Y

ZX

XY

Z

Page 41: Course files

Looking ahead…

Have: Basic formal representation for causation Fundamental causal asymmetry (of

intervention) Inference & reasoning methods Search & causal discovery principles

Need: Search & causal discovery methods that

work in the real world