1 causal data mining richard scheines dept. of philosophy, machine learning, & human-computer...

Causal Data Mining

Richard Scheines

Dept. of Philosophy, Machine Learning, &

Human-Computer Interaction

Carnegie Mellon

Causal Graphs

Causal Graph G = {V,E} Each edge X Y represents a direct causal claim:

X is a direct cause of Y relative to V

Exposure Rash

Exposure Infection Rash

Chicken Pox

Causal Bayes Networks

P(S = 0) = .7P(S = 1) = .3

Smoking [0,1]

Lung Cancer[0,1]

Yellow Fingers[0,1]

P(S,YF, LC) = P(S) P(YF | S) P(LC | S)

The Joint Distribution Factors

According to the Causal Graph,

i.e., for all X in V

P(V) = P(X|Immediate Causes of(X))

Structural Equation Models

• Structural Equations: One Equation for each variable V in the graph:

V = f(parents(V), errorV)for SEM (linear regression) f is a linear function

• Statistical Constraints: Joint Distribution over the Error terms

Education

LongevityIncome

Causal Graph

Structural Equation Models

Equations: Education = ed

Income =Educationincome

Longevity =EducationLongevity

Statistical Constraints: (ed, Income,Income ) ~N(0,2)

2diagonal - no variance is zero

Education

LongevityIncome

Causal Graph

Education

Income Longevity

LongevityIncome

SEM Graph

(path diagram)

Tetrad 4: Demo

www.phil.cmu.edu/projects/tetrad

Causal Datamining in Ed. Research

1. Collect Raw Data

2. Build Meaningful Variables

3. Constrain Model Space with Background Knowledge

4. Search for Models

5. Estimate and Test

6. Interpret

CSR Online

Are Online students learning as much?

What features of online behavior matter?

CSR Online

Are Online students learning as much?

Raw Data : Pitt 2001, 87 students

For everyone: Pre-test, Recitation attendance, final exam

For Online Students: logged: Voluntary question attempts, online quizzes, requests to print modules

CSR Online

Build Meaningful Variables:

1. Online [0,1]

2. Pre-test [%]

3. Recitation Attendance [%]

4. Final Exam [%]

CSR Online

Data: Correlation Matrix (corrs.dat, N=83)

Pre Online Rec Final

Pre 1.0

Online .023 1.0

Rec -.004 -.255 1.0

Final .287 .182 .297 1.0

CSR Online

Background Knowledge:

Temporal Tiers:

1. Online, Pre

2. Rec

3. Final

CSR Online

Model Search:

No latents (patterns – with PC or GES)

- no time order : 729 models

- temporal tiers: 96 models)

With Latents (PAGs – with FCI search)

- no time order : 4,096

- temporal tiers: 2,916

Tetrad Demo

Online vs. Lecture

Data file: corrs.dat

Estimate and Test: Results

• Model fit excellent

• Online students attended 10% fewer recitations

• Each recitation gives an increase of 2% on the final exam

• Online students did 1/2 a Stdev better than lecture students (p = .059)

Final Exam (%)

Recitation Attendance (%)

Pre-test (%)

Online

References

• An Introduction to Causal Inference, (1997), R. Scheines, in Causality in Crisis?, V. McKim and S. Turner (eds.), Univ. of Notre Dame Press, pp. 185-200.

• Causation, Prediction, and Search, 2nd Edition, (2000), by P. Spirtes, C. Glymour, and R. Scheines ( MIT Press)

• Causality: Models, Reasoning, and Inference, (2000), Judea Pearl, Cambridge Univ. Press

• “Causal Inference,” (2004), Spirtes, P., Scheines, R.,Glymour, C., Richardson, T., and Meek, C. (2004), in Handbook of Quantitative Methodology in the Social Sciences, ed. David Kaplan, Sage Publications, 447-478

• Computation, Causation, & Discovery (1999), edited by C. Glymour and G. Cooper, MIT Press

1 causal data mining richard scheines dept. of philosophy, machine learning, & human-computer...

final examfor online

online quizzes

final examonline students

press causal inference

recitation attendance

online behavior matter

direct causal claim

models temporal tiers

Documents

causal laws i and causal instances

nov. 13th, 20031 causal discovery richard scheines peter...

1 day 2: search june 9, 2015 carnegie mellon university...

center for causal discovery (ccd) of biomedical knowledge...

introduction - cmu - carnegie mellon university · web...

richard scheines peter spirtes, clark glymour, dept. of...

tutorial causal model search - homepage - cmu causal model...

1 searching for causal models richard scheines philosophy,...

educational data mining ryan s.j.d. baker pslc/hcii carnegie...

causal data mining: identifying causal effects at scale

1 tutorial: causal model search richard scheines carnegie...

are there algorithms that discover causal structure? 30...

carnegie mellon school of computer...

causal pluralism and the limits of causal...

causal diagrams and the identification of causal effects

rma securities finance & collateral management … · bny...

1 richard scheines carnegie mellon university causal...

automatic causal discovery - carnegie mellon school of...

1 center for causal discovery: summer workshop - 2015 june...

causal inference and graphical models peter spirtes carnegie...