moving away from linear-gaussian assumptions

28
Moving away from Linear-Gaussian assumptions Cons: Some things become much harder. No baked-in test of global fit Non-recursive models Error correlations and Latent variables harder to deal with How do we label an arrow? Pros: Flexibility to model nodes with whatever statistical assumption we want to make. Better inference Better predictions

Upload: gala

Post on 24-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Moving away from Linear-Gaussian assumptions. Pros: Flexibility to model nodes with whatever statistical assumption we want to make. Better inference Better predictions . Cons: Some things become much harder. No baked-in test of global fit - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Moving away from Linear-Gaussian assumptions

Moving away from Linear-Gaussian assumptions

Cons:Some things become much harder.

No baked-in test of global fitNon-recursive models Error correlations and Latent

variables harder to deal withHow do we label an arrow?

Pros:Flexibility to model nodes with whatever statistical assumption we want to make.

Better inferenceBetter predictions

Page 2: Moving away from Linear-Gaussian assumptions

Causal Effects in Non-linear models: How big is the effect?

firesev

age

Page 3: Moving away from Linear-Gaussian assumptions

The Logic of Graphs: Conditional Independences, Missing link & Testable implications

How do we test structure of the model without Var-Cov matrix?

x

y1 y2

y3

For directed, acyclic models where all nodes are observed,

Vi Non-Child(V⏊ j)|Pa(Vi,Vj)

The residuals of each pair of nodes not connected by a link should be independent.

Each missing link represents a local test of the model structure

Individual test results can be combined using Fisher’s C to give a global test of structure.

k

iipC

1

)ln(2

Page 4: Moving away from Linear-Gaussian assumptions

The Logic of Graphs: Conditional Independences, Missing link & Testable implications

How do we test structure of the model without Var-Cov matrix?

x

y1 y2

y3

How many implied CI there?

N(N-1)/2-L

Where N= number of nodesL=number of links

Page 5: Moving away from Linear-Gaussian assumptions

Strategy for local estimation analysis

1.Create a causal graph

2.Model all nodes as functions of variables given by graph (using model selection of pick functional form)

3.Evaluate all conditional independences implied by graph using model residuals

4. If conditional independence test fails modify graph and goto 2

Page 6: Moving away from Linear-Gaussian assumptions

Generalized Linear Models – 3 components

A probability distribution from the exponential familyNormal, Log-Normal, Gamma, beta, binomial, Poisson, geometric

A Linear predictor

A Link function g such thatIdentity, Log, Logit, Inverse

Page 7: Moving away from Linear-Gaussian assumptions

7

California wildfires example

age firesev cover

distance

abio

hetero

rich

Page 8: Moving away from Linear-Gaussian assumptions

8

California wildfires example

age firesev cover

distance

abio

hetero

rich

Page 9: Moving away from Linear-Gaussian assumptions

Causal Assumptions:

dist ageage firesevfiresev covercover richdist rich

Implied Conditional Independences:

firesev dist | (age)⏊cover dist | (firesev)⏊cover age | (firesev)⏊rich age | (cover,dist)⏊rich firesev | (cover,dist)⏊

A. Submodel – it’s causal assumptions and testable implications.

Page 10: Moving away from Linear-Gaussian assumptions

A. Functional Specification I – Models of Uncertainty

Variable Potential values Prob. Dist.

age {0,1,2,3,…} Negative Binom

rich {0,1,2,3,…} Negative Binom

firesev (0, ∞) Gamma

cover (0, ∞) Gamma

Page 11: Moving away from Linear-Gaussian assumptions

A. Functional Specification II – Models for Expected Values

Page 12: Moving away from Linear-Gaussian assumptions

B. Modeling the Nodes - Age

age

dist

>library(MASS)>a1.lin<-glm.nb(age~distance,data=dat)>a1.q<-glm.nb(age~distance+I(distance^2),…)

> AICtab(a1.lin,a1.q,weights=T)

dAIC df weight a1.q 0.0 4 0.99662a1.lin 11.4 3 0.00338

>curve(exp(p.l[1]+p.1[2]*x),from=0,to=100,add=T)>curve(exp(p.q[1]+p.q[2]*x+p.q[3]*x^2),from=0,to=100,add=T,lty=2)

Page 13: Moving away from Linear-Gaussian assumptions

firesev

age

>f.lin<-glm(firesev~age,family=Gamma(link="log"),…)

B. Modeling the Nodes - Firesev

>curve(exp(p.f.lin[1]+p.f.lin[2]*x),from=0,to=100,add=T)

Page 14: Moving away from Linear-Gaussian assumptions

1axybax

Aside- Linearization of a saturating function

1 1 baxy ax

Page 15: Moving away from Linear-Gaussian assumptions

firesev

age

>f.sat<-glm(firesev~I(1/age),family=Gamma(link="inverse"),…)

>curve(1/p.f.sat[2]*x/(1+1/p.f.sat[2]*p.f.sat[1]*x),from=0, to=65,add=T,lty=2)

B. Modeling the Nodes - Firesev

Page 16: Moving away from Linear-Gaussian assumptions

firesev

age

B. Modeling the Nodes - Firesev

> AICtab(f.lin,f.sat,weights=T)

dAIC df weight f.sat 0.0 3 1f.lin 16.2 3 <0.001

Page 17: Moving away from Linear-Gaussian assumptions

B. Modeling the Nodes - Cover

cover

firesev

>c.lin<-glm(cover~firesev,family=Gamma(link=log),…)

>curve(exp(p.c[1]+p.c[2]*x),from=0,to=9,add=T,lwd=2)

Page 18: Moving away from Linear-Gaussian assumptions

B. Modeling the Nodes - Richness

cover

firesev

dist

>r.lin<-glm.nb(rich~distance+cover,data=dat)

>r.q<-glm.nb(rich~distance+I(distance^2)+cover,…)

> AICtab(r.lin,r.q,weights=T)

dAIC df weight r.q 0.0 5 0.99767r.lin 12.1 4 0.00233

Page 19: Moving away from Linear-Gaussian assumptions

C. Testing the conditional independences

Implied Conditional Independences:

firesev dist | (age)⏊cover dist | (firesev)⏊cover age | (firesev)⏊rich age | (cover,dist)⏊rich firesev | (cover,dist)⏊

Method for testing conditional indepedences:For each implied conditional independence statement:1. Hypothesize that a link between the variables exists

2. Quantify the evidence that the link explains residual variation in the variable chosen as the response.

Page 20: Moving away from Linear-Gaussian assumptions

C. Testing the conditional independences

Page 21: Moving away from Linear-Gaussian assumptions

C. Testing the conditional independences

Page 22: Moving away from Linear-Gaussian assumptions

C. Testing the conditional independences

What we need:1. List of all implied conditional independences2. Residuals for all fitted nodes>source(‘glmsem.r')

>fits=c("a1.q","f.sat","c.lin","r.q")

>stuff<-get.stuff.glm(fits,dat)

get.stuff.glm returns:1. R^2 for each node ($R.sq)2. Estimated Causal Effect*(over obs. range) ($est.causal.effects)3. Graph implied condition independences ($miss.links)4. Predicted values for each node ($predictions)5. Residuals for each node ($residuals)6. Matrix of links in the graph ($links)7. Matrix of prediction equations ($pred.eqns)

Page 23: Moving away from Linear-Gaussian assumptions

C. Testing the conditional independences

>nl.detect3(dat,stuff$residuals,stuff$miss.links)

$p.valsdistance-firesev distance-cover age-cover age-rich firesev-rich 0.058 0.252 0.523 0.872 0.134

$fisher.c [1] 14.04139

$d.f[1] 10

$fisher.c.p.val[1] 0.1711122

Page 24: Moving away from Linear-Gaussian assumptions

D. Check Model - Residuals

>pairs(stuff$residuals)

Page 25: Moving away from Linear-Gaussian assumptions

D. Check Model- Parameter Estimates

>sapply(fits,function(x)summary(get(x))$coefficients)$a1.q Estimate Std. Error z value Pr(>|z|)(Intercept) 3.4600063194 8.944635e-02 38.682476 0.0000000000distance -0.0228871119 5.925116e-03 -3.862728 0.0001121277I(distance^2) 0.0002595776 6.729042e-05 3.857571 0.0001145194 $f.sat Estimate Std. Error t value Pr(>|t|)(Intercept) 0.150971 0.01325182 11.39247 5.264449e-19I(1/age) 1.427400 0.26099889 5.46899 4.189435e-07 $c.lin Estimate Std. Error t value Pr(>|t|)(Intercept) 0.213267 0.1382210 1.542942 1.264334e-01firesev -0.132441 0.0284891 -4.648832 1.166142e-05 $r.q Estimate Std. Error z value Pr(>|z|)(Intercept) 3.4603244955 7.030880e-02 49.216093 0.000000e+00distance 0.0164087246 3.150035e-03 5.209060 1.897993e-07I(distance^2)-0.0001408172 3.540241e-05 -3.977617 6.960945e-05cover 0.2361592759 8.581527e-02 2.751949 5.924170e-03

Page 26: Moving away from Linear-Gaussian assumptions

D. Check Model- Print Resulting Graph

#requires graphviz and {PNG}>glmsem.graph(stuff)

Page 27: Moving away from Linear-Gaussian assumptions

E. Run a Query (intervention)

new.dat<-datnew.dat[,'age']<-2dat.int<-calc.intervention.glm(fits,stuff$links,"age",new.dat)

Page 28: Moving away from Linear-Gaussian assumptions

Discussion

Get glmsem.r and these slides and R code for exmpl at:www.msu.edu/~schoolm4/Code_and_More.html