course outline - duke universitypublic.econ.duke.edu/~kdh9/courses/graphical causal...6 2....

57
1 Causal Search Using Graphical Causal Models Kevin D. Hoover Duke University

Upload: others

Post on 15-Mar-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

1

Causal Search Using Graphical Causal Models

Kevin D. Hoover Duke University

Page 2: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

2

Course Outline

LECTURE 1. Motivations and Basic Notions

LECTURE 2. Causal Search Algorithms

LECTURE 3. Applications

Page 3: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

3

Course Website: http://econ.duke.edu/~kdh9 (click on Courses An Introduction to Graphical Causal Models Lectures Lecture 1. Lecture 2. Lecture 3.

Annotated Bibliography

Tetrad Tetrad Project Homepage Tetrad IV Download Site

Bootgraph Software Download

Data Sets Text and Excel versions available; documentation in Excel versions. Text versions are a pure data matrix only without headers or date array; dimensons of data matrix in text version in parentheses (observations x variables).

Swanson­Granger Data: Text (216 x 4); Excel U.S. M2 Data (10 variable): Text (181 x 10); Excel U.S. M2 Data (11 variable): Text (182 x 11); Excel Hoover­Jorda Data (U.S. Macro): Text (494 x 6); Excel

Page 4: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

4

Causal Search Using Graphical Causal Models

Kevin D. Hoover Duke University

Lecture 1 Motivations and Basic Notions

Page 5: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

5

I. Motivations: Why do we care about causal order?

1. Regressions have an implicit causal direction.

Example: Regression:

LIQDEP = 15.13 – 0.12FF R 2 = 0.62

Algebraic transformation of regression: FF = 126.08 – 8.33LIQDEP

Reverse Regression: FF = 81.95 – 5.30LIQDEP

R 2 = 0.62

Page 6: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

6

2. Regression on effects is misleading.

Example: Data­generating Process

X1 = N(0,1) X2 = 5X1 + N(0,1) X3 = X2 + 0.1N(0,1)

Causally­oriented regression X2= 0.038 + 5.033X1

(0.08) (67.1)

Regression on Effect X2= 0.008 + 0.057X1 + 0.99X3

(0.01) (1.56) (138.2)

Potential problem for PcGets or Autometrics.

Page 7: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

7

3. The causal order of structural vector autoregressions (SVAR) is critical to conclusions.

Impulse Responses of Money to a Consumption Shock for Two Contemporaneous Causal Orderings of U.S. Money, GDP, Consumption, and

Investment

­0.6

­0.5

­0.4

­0.3

­0.2

­0.1

0

0.1

0.2

0.3

1 2 3 4 5 6 7 8 9 10

Quarters

Percent

Choleski (M,Y, C, I )

Choleski (M,C, I, Y ) Response of Y to a one­standard deviation shock to C ( 0.33)

Page 8: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

8

II. The Language of Graphs.

1. SOURCES OF THE GRAPH­THEORETIC APPROACH TO CAUSAL MODELING:

ØSpirtes, Glymour, and Scheines (2000), Causation, Prediction, and Search, 2 nd edition.

ØPearl (2000), Causality: Models, Reasoning, and Inference.

Issues: • Representation of data probabilistically.

• Representation of asymmetries causally and graphically.

• Relationship between two representations.

Page 9: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

9

2. BASIC ELEMENTS OF GRAPHS:

• Nodes (or vertices) represent variables;

• Edges represent causal relationships;

• Arrowheads represent causal asymmetry.

• A path (or trek) is a sequence of edges starting at one node and ending at another.

• A directed path is a path in which the arrowheads align in a continuous direction.

Page 10: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

10

A Graph

E A nodes directed edge

B C D undirected edge

bidirected edge

BAC is a path. DAE is a directed path.

Page 11: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

11

• Directed edge: B A read as “B causes A”;

• Bidirected edge: C D read as “C causes D and D causes C” or “C and D are mutual causes (or simultaneous causes)” NB. bidirectional edges sometimes given other functions in the literature, so that mutual causation is sometimes represented as: C D

• Undirected edge: C A read as “causal direction between C and A undetermined.”

• In the literature, edges may have other endpoints (e.g., ο or ο ο) representing relations not used in this course.

Page 12: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

12

Skeleton = pattern of edges irrespective of orientation.

These graphs have the same skeleton:

A

B C

D

A

B C

D

A

B C

D

Page 13: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

13

Patterns

Acylical Graph

A

B C

Cyclical Graph

A

B C

Simultaneous Graph

A

B C

Page 14: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

14

3. RELATIONSHIP OF GRAPHS TO EQUATIONS.

Example

V = εV W = aWXX + εW X = aXVV + εX Z = aZXX + εZ

where each εj ~ independent N(0, 2 j σ )

V X

W Z

Page 15: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

15

Variables regarded as stochastic with random element error term. Could be included in graph; for example

εV εX

V X

W Z

εW εZ Independence implies only one edge from any error term; therefore, typically omitted.

Page 16: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

16

Same system in matrix notation

Let

=

Z X W V

Y ,

=

Z

X

W

V

ε ε ε ε

E ,

such that the covariance matrix Ω = E(EE’) is diagonal.

Then

AY = E,

Page 17: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

17

E AY =

=

=

Z

X

W

V

ZX

XV

WX

Z X W V

a a

a

ε ε

ε ε

0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 0 1 0 0 1 0 0 0 0 1

Causal ordering among the variables in Y defined by the zero restrictions on A.

V X

W Z

Page 18: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

18

III. Graphs and Probability

1. LIMITATIONS OF THIS COURSE.

• Acyclical graphs only.

• Causal sufficiency. A graph is causally sufficient when no variable not included in the graph is a cause of more than one variable included in the graph. Causal sufficiency rules out latent variables and correlation among random error terms.

Page 19: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

19

2. IDENTIFICATION.

Two concepts of causal identification:

• Theoretical = actual causal structure that generated the data is recoverable (an important goal);

• Uniqueness = data correspond to a unique causal representation (failure of which is an important obstacle).

Page 20: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

20

Folk theorem: economic data cannot be identified in the sense of uniqueness without the imposition of a causal structure selected by a priori theory identifying a model in the sense of theoretical identification. A priori theoretical identification is a necessary and sufficient condition for identification in the sense of uniqueness.

Corollary: Tests of overidentifying restrictions are conditional on a priori identification.

The folk theorem and its corollary are false.

Page 21: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

21

Proof by counterexample:

Simple data­generating process

A = εA

B = εB

C = aCAA + aCBB + εC

Its graph

A B

C

Page 22: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

22

In matrix form

E AY =

+

=

C

B

A

CB CA C B A

a a ε ε ε

1 0 1 0 0 0 1

and

=

= ′ =

CC

BB

AA

C C B C A C

C B B B A B

C A B A A A

E E σ

σ σ

ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε ε

0 0 0 0 0 0

) ( E E Ω

Page 23: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

23

If form of system (and its graph) were known, then it could easily be estimated.

The identification problem is that we do not know it, unless theory tells us a priori.

Key Question: Can the DGP be recovered from the data alone?

• Folk theorem says, NO.

• Real answer, SOMETIMES: the point of the course.

Page 24: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

24

The reduced form describes the probability distribution of the data:

E A U Y AY A 1 1 − − = = =

Page 25: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

25

U Y =

=

=

C

B

A

C B A

υ υ υ

and

=

= ′ =

CC CB CA

BC BB BA

AC AB AA

C C B C A C

C B B B A B

C A B A A A

E E ω ω ω ω ω ω ω ω ω

υ υ υ υ υ υ υ υ υ υ υ υ υ υ υ υ υ υ

) ( U U Σ

Page 26: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

26

Σ is easily estimated.

Identification requires a diagonal covariance matrix.

Many matrices P exist, such that

Ψ U P Y P = = − − 1 1

such that

= ′ =

CC

BB

AA

E ψ

ψ ψ

0 0 0 0 0 0

) ( Ψ Ψ Λ

Page 27: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

27

• symmetry implies that P ­1 imposes 3 restrictions on Λ;

• therefore, P ­1 has at most 3 degrees of freedom if it is just­identified in the sense of uniqueness;

• or, equivalently, 3 restrictions.

• In general, if Y contains n variables, the covariance matrix of the reduced form must be diagonal imposing n(n ­1)/2 restrictions and the matrix multiplying the variables imposes another n(n ­1)/2, for a total of n(n ­1) to achieve identification.

Page 28: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

28

• For a given ordering of the variables in Y (here ABC), there exists a unique lower triangular matrix P such that

) )' ( ( 1 1 U P U P Λ − − = E .

• P is called the Choleski matrix and a transformation using P ­1 to achieve orthogonal error terms – i.e., Λ diagonal is a Choleski transformation (or decomposition).

• Every just­identified system can be expressed in a Choleski ordering in which P ­1 is lower triangular.

• There is one Choleski ordering for each ordering of the variables in Y; here 3! = 6 orderings; in general n! orderings.

Page 29: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

29

• Starting with ) ( U U Σ ′ = E , finding a Choleski ordering is just a matter of calculation.

• All just­identified systems of the same variables (i.e., all Choleski orderings) have the same likelihood.

Page 30: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

30

Consider a simulation based on data­generating process:

E AY =

+

− − =

C

B

A

C B A

ε ε ε

1 3 / 1 3 / 1 0 1 0 0 0 1

,

where εA, εB ~ independent N(0,1) and

εC ~ independent N(0, 1/9)

Consider the DGP unobservable.

Page 31: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

31

The reduced form, however, can be estimated:

U Y ˆ

ˆ ˆ ˆ

05 . 0 03 . 0 05 . 0

=

+

− =

=

C

B

A

C B A

υ υ υ

where each of the constants is statistically insignificant.

= ′ =

1 59 . 0 58 . 0 59 . 0 1 01 . 0 58 . 0 01 . 0 1

) ˆ ˆ ( ˆ U U Σ E

standardized to be the correlation matrix.

Page 32: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

Alternative Choleski Orderings

32

P ­1 for order ABC

1 0 0 ­0.012 1 0 ­0.33 ­0.36 1

log likelihood = –558

P ­1 for order BAC

1 0.01 0 0 1 0

­0.33 ­0.36 1

log likelihood = –558

P ­1 for order ACB

1 0 0 0.47 1 ­1.43 ­0.34 0 1

log likelihood = –558

P ­1 for order BCA

1 0.53 1.50 0 1 0 0 ­.36 1

log likelihood = –558

Page 33: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

Alternative Choleski Orderings

33

P ­1 for order CAB

1 0 ­0.99 0.47 1 ­1.430 0 0 1

log likelihood = –558

P ­1 for order CBA

1 0.53 ­1.50 0 1 ­0.96 0 0 1

log likelihood = –558

Page 34: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

34

• Folk theorem appears correct: quantitatively very different orderings indistinguishable on the basis of likelihood.

• But, not all the information in the likelihood function has been exploited.

• Consider the correlation matrix of the reduced form:

1 0.01 0.58

0.01 1 0.59

0.58 0.59 1

• The correlation between A and B is statistically zero.

Page 35: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

35

That correlation could not be zero if the graph of the DGP were

A B C

or any other ordering in a line. Nor could it be

A B

C

or any other ordering with a fork.

Page 36: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

36

Only ordering consistent with the zero correlation:

A B

C

And that’s the graph of the DGP!

Page 37: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

37

Its specification:

E Y A ˆ

ˆ ˆ ˆ

1 36 . 0 33 . 0 0 1 0 0 0 1

ˆ =

+

− − =

C

B

A

C B A

ε ε ε

= ′ =

33 . 0 0 0 0 94 . 0 0 0 0 99 . 0

) ( ˆ E E Ω E

Page 38: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

38

• Contrary to the folk theorem, the graph of the DGP identified from data alone without first positing a just­ identified model, and then testing the overidentifying restriction.

• In fact, DGP is not even nested in the Choleski orderings ACB, BCA, CAB, CBA.

• DGP nested in both ABC, BAC.

• In a sense, overidentification precedes identification.

• Still, we can test the overidentified model: the p­value vs. null of ABC Choleski order is p = 0.87.

Page 39: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

39

3. PRINCIPLES OF SEARCH.

• Model in example identified from unconditional independence information.

• To generalize beyond the 3­variable case, consider conditional independence:

A independent of B conditional on Z iff P(A, B|Z) = P(A|Z)P(B|Z).

Page 40: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

40

Patterns of Dependence and Independence

Case 1: Screen

A B C

A and C are probabilistically dependent (correlated), but conditional on C, they are independent: B screens off the influence of A on C.

Case 2: Common Cause

A B

C

A and C are probabilistically dependent (correlated), but conditional on C, they are independent: C is a common cause of A and B; and C screens off the correlation between of A on B.

Page 41: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

41

Case 3: Unshielded Collider

A B

C

A and C are probabilistically independent (uncorrelated), but conditional on C, they are dependent: C is a common effect (or unshielded collider on the path ACB) of A and B; and C induces a correlation between A on B.

ØIllustration: Let A = the state of charge of a car’s battery (charged/flat); B = the position of the car’s starter switch (off/on); and C = the state of the car’s operation (starts/doesn’t start). A and B may be completely independent. Yet, if we know that the car doesn’t start, then knowing the that switch is on raises the probability that the battery is flat.

Page 42: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

42

• Search algorithms construct graphs from data exploiting patterns of unconditional and conditional independence based on these basic forms.

• Key idea: Reichenbach’s (1956) Principle of the Common Cause: if any two variables, A and B, are truly correlated, then either A causes B (A → B) or B causes A or (A ← B) or the have a common cause.

• Generalization: Causal Markov Condition.

• d­separation: A set Z d­separates X from Y (where X and Y may be variables or sets of variables) iff Z blocks every path from a node in X to a node in Y with a screen or an unshielded collider.

Page 43: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

43

A B

C D

F E

C d­separates A and E;

C is the sepset for A and E.

Page 44: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

44

A B

C D

F E

A and B d­separates D and C;

A, B is the sepset for D and C.

Page 45: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

45

Causal Markov Condition

A B

C D

F E

A and B are parents of C.

C is a child (or daughter) of A and B.

[NB. children may have only one parent or many – the modern family.]

Page 46: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

46

A B

C D

F E

D is an ancestor of B, C, E, and F.

B, C, E, and F are an descendents of D.

Page 47: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

47

Causal Markov Condition: a variable in a graph is, conditional on its parents, probabilistically independent of all other variables that are neither its parents nor its descendants.

• Essentially, the Causal Markov Condition holds when a graph corresponds to the conditional independence relationships in the associated probability distribution.

• A graph is said to be faithful if, and only if, every conditional independence relation implied by the causal Markov condition applied to the graph is found in the probability distribution.

Page 48: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

48

Failure of Faithfulness

X = εX

Y = αX + βZ + εY

Z = δX + εZ

Y

X Z

Suppose that it just happens that

δ = –α/β;

then

Y = αX + β(–αX/β + εZ) + εY

= βεZ + εY So, X and Y are uncorrelated!

But, say Spirtes et al. and Pearl, rare cases (Lebesque measure zero).

Page 49: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

49

X = εX

Y = αX + βZ + εY

Z = δX + εZ

Y

X Z

Interpretation 1: X = actions of waves on a ship; Y = the course of the ship; Z = the position of the helm.

Interpretation 2: X = drivers of the business cycle; Y = GDP; Z = policymaker’s instrument.

Optimal control ubiquitous in economics. Not rare (hardly Lebesque measure zero situations).

Page 50: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

50

Equivalent Graphs

Observational Equivalence Theorem (Pearl): Any probability distribution that can be faithfully represented in a causally sufficient, acyclical graph can equally well be represented by another acyclical graph that has the same skeleton and the same unshielded colliders.

Page 51: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

51

A B

C D

F E

Unshielded colliders: C on ACE C on BCE

A B

C D

F E

Equivalent graph

Page 52: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

52

A B

C D

F E

Unshielded colliders: C on ACE C on BCE

A B

C D

F E

Nonequivalent graph: removes two unshielded colliders.

Page 53: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

53

A B

C D

F E

Unshielded colliders: C on ACE C on BCE

A B

C D

F E

Nonequivalent graph: adds an unshielded collier (B on ABD)

Page 54: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

54

Earlier Cases Reexamined

All two­variable systems are observationally equivalent:

A B

A B No unshielded colliders.

Page 55: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

55

All three­variable systems, except one with a common effect, are observationally equivalent:

Chains A B C

A B C

Common Cause A B C

No unshielded colliders.

Page 56: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

56

Common Effect

A B C

One unshielded collider.

Page 57: Course Outline - Duke Universitypublic.econ.duke.edu/~kdh9/Courses/Graphical Causal...6 2. Regression on effects is misleading. Example: Data generating Process X1 = N(0,1) X2 = 5

57

End

of

Lecture 1