a projection-based approach for modeling non-gaussian ... · murali haran, penn state 3. house...

29
A Projection-based Approach for Modeling Non-Gaussian Spatial Data Based on joint work with Yawen Guan, SAMSI/NC State John Hughes, U Colorado-Denver BIRS, CMO, Oaxaca, Mexico November 2017 Murali Haran Department of Statistics, Penn State University Murali Haran, Penn State 1

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

A Projection-based Approach for Modeling

Non-Gaussian Spatial DataBased on joint work with

Yawen Guan, SAMSI/NC State

John Hughes, U Colorado-Denver

BIRS, CMO, Oaxaca, Mexico

November 2017

Murali Haran

Department of Statistics, Penn State University

Murali Haran, Penn State 1

Page 2: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

US Infant Mortality Data by County

!

!

!

!

!

!!

!

!

!

!

!!

!!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!!

!!

!

!!

!!

!

!

!!

!

!

!!

!

!

!

!

!!

!

!

!

!!

!

!!

!!

!

!

!

!

!

!

!

! !

!!

!

!

!

!

!

!

! !

!

!! !

!!

!

!

!

!

!

!

!

!

!!

!!

! !!

!!

!

!!

!

!!

!

!!

!

!

!

!

!

!

!!

!!

!!

!

!

!!

!

!

!

!!

!

!

!

!!!

!

!!!

!

!

!

!!

!

!!

!!!

!!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

! !

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!!

!

!

!

! !

!

!

! !

!

! !

!!

!

!

!

!!

!!

!

! !!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!!!

!

!

!

!

!

!

!

!! !! !!!

!!

!!

!

!! !

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!!

!

!

! !!

!

!

!

!

!!

!

!

!

!

!!

!

!!

!

!! !

!!

!

!

!

!!

!! !

!

!!!

!!!

!

!

!!!

!!

!!

!!

!!

!!

!!

!

!

!

!

!!

! ! !

!

!

!

!

!!

!

!

!!

!

!!

!

!

!!

!

!

!!

!

!

!!

!

!

! ! !

!

!

!

!

!

!

!!

!!

!!

!

!

!!

!

!

!

!

!

!!!!!

!

!

!!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!!!!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

! !!

!

!!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!!

!!!

!!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!!!

!!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

! !!!

!

!

!!

!

!!

!

!

!!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!! !

!

!

!

!

!

!

!! !

!

!

!

!!

!!

!

!

!

!

!

!

!!! !

!

!

!

!

!

!

!

! !!

! !

!

!

!!!

!

! !!

!

!!

!

!

!

!!

!!

!

!

!

!

!

!

!!

!

!

!

!

!!

!

!!

!!

!

!!

!!!

!

!!

!!

!

!!

!

!

!

!

!!

!!

!!

!!

!!

!

!

!

!

!

!!

!

!

!

!

!!

! !

! !!

!

!!

!!

!!

!

!!!!

!

!

!

!!!

!

!!! !

!!

!!

!!

!

!

! !

!!!

!!

!

!

!

!

!!!

!

!

!!

!

!

!

!

!!

!!

!

!!

!

!

!!

!

!

!

!

!!

!

! !!

!

!

!

!

!!!

!!

!

!! !

!!

!

!

!

!!

!

! !

!! !

!! !

!

!

! !

!

!

!

!!

!

!

!

!

!

!!

! !

!

!

!!

!

! !!

!

!

!

!

!!

! !!

!

!!!

!

! !

!

!!

!

!!

!

!!!

!

!!!

!

!

!!

!

!!!!

!

!

!

!

!

!

!

!

!

!

!

!

! !

!

!!

!

!

!

!

!

!!

!! !!

!!

!!!

!

!!! !

!! !!

!

!

!!

!

!!

!

! !

!

!

!

!!

!

!

!!!

!!!

!

!

! !

!

!

!

!!

!!

!!

!!

!!

!!

!

!

!

!

!

! !

!

! !!

!

!!

!

!

!!

! ! !

!

!!

!!

!!!!

!

!! !!! !

!! !

!

!

!

!

! !!

!!

!!

!

!

!

!

!!

!

!

!!!

!

!

!

!

!

!!

!

!

!

!!

!

!!

!

!!

!

!!

!!

!!

!!

!!!

!

!

!

!

!

!!

!

!

!!

!!

!!

!

!! !

!

!

!

!

!!

!

!

!

!

!!

!!

!

!!

!

!!!!!

!!

!

!

!

!

! !

!

!

!

!

!

!

!

!

! !!!!!

!!

!

!

!

!

!

!!!

!!

! !

! !

!!!

! !!!

!

! !!

!

!!

!

!! !!

!!!!

!

!

!!!!

!

!

!

!!

!

!

!

!

!

! !!

!

!!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!!

!

!

!

!!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!!

! !!

!

!

!

!

!

!!

!!

!

!

!

!!

!

!!

!

!

!

!

!

! !!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!!!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!

!

!

!

!

!!

!

!

!

!

!

! !!

!!

! !

!!

! !

!

!!!!

!

!!

!

!!!

!

!

!! ! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!!

!

!!!

!!

!

!!!

!!

!

!!

! ! !! !

!

!

!

!

!

!

!

!

!

!!!

!

!

!

! !

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!!

!!

!!

!

!!

!

!!

!!

!

!

!!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

!!

!!

!

!

!

!!

!

!!

!

!

!

!!

!

!

!

!!

! !

!

!!!

!

!

!

! !!

!!

!!

!!

!

!!

!

!

!!

!

!

!!

!

!

!

!!

!

!

! !

!!

!

!

!!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!

! !

!

!

!

!!

!

!

!!

!!

!

! !

!!!

!

!

!

!

! !

!

!

!!! !

!

!!

!

!

!!

!

!

!

!

!

!

!!

!

!!

!!

!

!!

! !!

!! !

! !

!

!!

!!

!!!

!!

!

! !!

!

!

!

!

!!!

!!

!

!

!!

!!!!

!!

!

!!

! !

!

!!!

!

!

!

!

!

!

!

! !! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

!!

!

!

!

!!!!!

!

!

!!!!

!

!

!!!!!!

!

!

!

!!!!

!

!

!

!

!

!!

! !

!

!

!

!!

!

!

!

! !

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!! ! !!

!!

!

! !

!!

!

!

!

!!!

!

!!

!!

!!

!

!!

!

!

!

!

!!

!

!!!

!!

!

!!!

! !

!!

!

!

! !

!!!

!

!! !

!

!!

! ! !!

!

!

!

!!

!

!

!!

!

!!

!!

!!!

! !! !

!

!

!

!

!!

!

!!!

!

!

!!

!

!!! !

!!

!

!

!

!!

!!

!

!

!

!

!!

!

!

!!!

!

!

!!

!

!! !

!!

!!

! !

!

!

!!

!!

!!

! !!!

!

!

!

!

!

!

!

! !

!

!

!

! !

!

!!

!

!

!

!!

!!

!

!

! !!!

!!

!

!! !

!

!

!

!

!

!

!!

!!

!

!

!

!!

!

!

!

!!

!

!

! !

!!

!

!!

!!

!!

!

!

!

!

!

!

!!!

!

!

!

!!

!

!! !

!

!!

!!

!

!!

!

!

!!

!!

!

!

!

!

!

!! !! !

!!!

!!

!!!

!

!

! !

!

!

!

!

!

!!

!

!!

!

!!!

!! !

!

!

!

!

!

!!

!

!!

!

!

!

!

!!!

!

!

!!

!!

! !

!!

!

!!

!

!!!

! !

!

!

! !!

!!

!

!

!

!

!

!

!

!

!

!!

!! !

! !!!

!!!

!

!

!

!!

!

!

!

!!

!

!

!!

!

!!

!!

! !

!

!!

!

!

!

!

!

!

! ! !

!

!!

!

! !

!! !

!

!! !

!

!

!

!

!!!!

!

!!

!

!!!

! !!

!

! !! !!

!!

!

!!

!

!

!!!

!!!

!

!

!! !

!!!!

!!

!

!! !!

! !

!!

!!

!

!!

!

!!!

!

!

!

!

!

!!!!!

!!!

!

!!

!

!!

!

!! !

!

!

!!

!!

! !!

!

!

!

!

!

!!

! !!!

!!

!!

!

!

!!

!

!!

!

!

!

!

!!

!

!

!!

!

!

!

!

!

!

!

! !

!!

!

!

!

!

! !

!

!!!

!

!

!

!

!

!!!

!!

!

!

!!

! !

!

!!

!!!

!!

!

!!

!!!

! ! !!

!

!

!

!

!! ! !

!

!

!!

!!

!

!!!

!!!!

!!!!

!

!

!

!

!

! !

!

!

!

!

! !

!

! !

!!!!

! !!

!!!

!!

!

!!

!

!

! !!! !!

!

!

!!!

!

!

!

!!!

!

!

!

!!

!

!!!

!

!

! !!

!!

!

!

! !!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!!

! !

!

!

!!

!!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!!

!

!! !

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!!

!

!

!

!!

!

!

!

!

!

!!

!

!

!

! !!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!

!

!!

!

!!!

!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!

!

!!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!

!

! !

!!

!

!

!!

! !

!

!

!!

!

!

!

!

!

!

!!

!

! ! !

!

!

!!

!

!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!

! !

!

!

!

! !

!

!!

!

!

!

!

!

!

!

!

!

!

!

!

!

!!

!!

!!!

!

!

!

!

!

!!!!!!

!

!

!

!

!!

!!! !!!

!

!!

!!

!!!

!!

!

!

!!

!

!

!

!

!

!!

!

!!

!

!

!

!

!

!!

!

!

!!

!!

!

!

!!!

!!!

!

!

!

!

!

!

!

!

!

! ! !!

!

!!

!!

!! !

!

!

!!

!!

!

!!

!

!!

!!

!!!

!

!

!

! !

!!!!

!

!!

!

!

!!

!

!!

!!

!

!

!!!

!

!

!

!!

!!

!

!!

!!

!

!

!

!!

!

!

!

!!

!!

!

!

!

!!!

!

!

!!

!

!

!

!!!

!

!

!

!!!

!!

!

!

!

!

!

!

!

!

!

!!

!

!

!

!

!!!

!

!!!!

!!

!

!!

!

!

!

!

!

!!

!

!

!!

!! !

!

!

!

!!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

!!

!

!!

!

!

!

!

!!

!

!

!!

!

!

!

!!

!

!!

!

!

!

!

!

!

!

!

!

!

!!

!!!

!

!

!!

!

!

!

!

!

! !

!

!! !

!

!

!

!

!

!

!

!!

Ratio of deaths to births, each averaged over 2002-2004.

Darker indicates higher rate. n =3071

Question (regression): which factors impact infant mortality?

Murali Haran, Penn State 2

Page 3: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Greenland Ice Sheet Thickness

Bamber et al. (2001)

Question: How to interpolate this surface?

How to calibrate (infer parameters for) ice sheet model based

on these data? (Chang, Haran, Applegate, Pollard, 2016a,b,c)Murali Haran, Penn State 3

Page 4: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

House Finch Abundances

Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966 - 2014

Question (interpolation): Abundance at unsampled location?

Murali Haran, Penn State 4

Page 5: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Talk Summary

I Spatial data are common in environmental science:

disease modeling, ecology, climate...I Spatial generalized linear mixed models (SGLMMs)

I Popular for lattice or areal data

Besag, York, Mollie (1991) ≈ 3,000 citationsI and continuous-domain data

Diggle et al. (1998) ≈ 2,000 citations

I Shortcomings of SGLMMs:1. Inference presents difficult computational issues, especially

with large data sets

2. Regression parameter interpretation is unreliable

I I will describe projection-based methods that

simultaneously resolve both these issues

Murali Haran, Penn State 5

Page 6: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Spatial Generalized Linear Mixed Models

I Spatial linear mixed models (SLMMs): for Gaussian data

I Spatial generalized linear mixed models (SGLMMs): for

non-Gaussian data

I What are these models used for?

1. interpolation (continuous-domain) or smoothing the spatial

field (lattice-domain)

2. regression while adjusting for residual spatial dependence

3. as a component in a complex hierarchical model

Murali Haran, Penn State 6

Page 7: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Spatial Linear Mixed Models (SLMMs)

I Spatial process at location s ∈ D ⊂ Rd is

Z (s) = X (s)β + W (s)

I X (s) is covariate at s, and β is a vector of coefficientsI Model dependence among spatial random variables by

imposing it on W (s), the random effects

I Same framework works for both lattice data andcontinuous-domain data. Model for W (s)

I Continuous domain: Gaussian process (GP)I Lattice data: Gaussian Markov Random field (GMRF)

Murali Haran, Penn State 7

Page 8: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Gaussian Processes

Infinite dimensional process {W (s) : s ∈ D} such that

(W (s1), . . .W (sn))T | Θ ∼ N(0,Σ(Θ))

I Covariance often specified via a positive definite

covariance function with parameters Θ

I E.g. (stationary) exponential covariance function

I Θ = (σ2, φ)

Σij(Θ) = Cov(W (si),W (sj)) = σ2 exp(−|si − sj |/φ)

Murali Haran, Penn State 8

Page 9: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Gaussian Markov Random Fields

(W (s1), . . .W (sn))T | Θ ∼ N(0,Q(Θ)−1)

Q(Θ) is a precision matrix based on a graph that describes a

neighborhood structure: adjacencies specify dependence

(skip details....)

Murali Haran, Penn State 9

Page 10: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Inference for Spatial Linear Mixed Models

I MLE involves low-dimensional optimization

arg maxΘ,β

L(Θ,β; Z)

I Bayesian inference:I Priors for Θ,β

I Inference based on π(Θ,β | Z) ∝ L(Θ,β; Z)p(Θ)p(β)

I Markov chain Monte Carlo with low-dimensional posterior

Murali Haran, Penn State 10

Page 11: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Literature on Computing for Spatial Linear Models

I Likelihood: high-dimensional matrices, O(n3) operations

I Lots of excellent approaches that scale very wellI Multiresolution methods, with parallelizations (Katzfuss,

2017; Katzfuss and Hammerling, 2014)I Nearest neighbor process (Datta et al., 2016)I Random projections (Banerjee, A., Tokdar, Dunson, 2013)I Stochastic PDEs (Lindgren et al., 2011)I Lattice kriging (Nychka et al., 2010)I Predictive process (Banerjee, Gelfand, Finley, Sang 2008)

Largely a “solved” problem

Murali Haran, Penn State 11

Page 12: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Spatial Generalized Linear Mixed Models (SGLMMs)

Model for Z at location si

1. Z (si)|β,Θ,W (si), i = 1, . . . ,n, conditionally independent

E.g. Z (si) | β,W (si) ∼Poisson(µ(si))

2. Link function g(µ(si)) = X (si)β + W (si)

E.g. log(µi) = X (si)β + W (si)

3. W = (W (s1), . . . ,W (sn))T modeled asI Gaussian Markov random field model (Besag et al., 1991)I Gaussian processes (Diggle et al., 1998)

4. Priors for Θ,β

Commonly embedded within hierarchical models (cf. Banerjee,

Carlin, Gelfand, 2014)

Murali Haran, Penn State 12

Page 13: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Problem 1. Computational Challenge

I MLE: low-dimensional optimization of integrated likelihood

arg maxΘ,β

∫L(Θ,β,W; Z)dW

High-dimensional integration due to W

MCMC-EM or MCMC-MLE: slow, challenging to implement

(Zhang, 2002, 2003; Christensen, 2004)

I Bayesian inference based on

π(Θ,β,W | Z)

Murali Haran, Penn State 13

Page 14: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Computing for SGLMMsBayes approach:

I Handle missing data easilyI Combine multiple data sets and uncertainties elegantlyI Rich inference about parameters, functions of parametersI MCMC-based inference is easier than for MLE

But... MCMC algorithms are not easy/scalable

I MCMC is slow per iteration due to high-dimensional

π(Θ,β,W | Z)

I Markov chain is slow mixing (need longer chain) due to

strong cross-correlations among WI Can become impractical for large N

Murali Haran, Penn State 14

Page 15: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

MCMC for SGLMMs

I Markov chain is slow mixing (need longer Markov chain)

due to strong cross-correlations among W

I Block updating schemes may help. E.g. blocks:

π(W | Θ,β,Z) π(Θ | β,W,Z) π(β | Θ,W,Z)

I Challenging to obtain good proposals for W, especially for

high-dimensionsI Computationally expensive per update

Attempts to address these issues: Rue and Held (2005),

Christensen et al. (2006), Haran and Tierney (2012)

They do not scale well (problem for N > 1000 )

Murali Haran, Penn State 15

Page 16: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Problem 2. Spatial Confounding

I Let P = X (X T X )−1X T , and P⊥ = I − P

g{E(Z | β,W,Θ} = Xβ + W = Xβ + PW + P⊥W

I PW is in span of X

I Basic regression issue: multicollinearity

Leads to variance inflation, unstable estimates of β

(Hodges and Reich 2010; Paciorek, 2010)

Hints of the symptom, without diagnosis, by others (e.g. Diggle,

1994)

Murali Haran, Penn State 16

Page 17: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Sketch of Our General Solution

I Culprit: W is cause of confounding as well as

computational challenges

I W: just a device to induce dependenceI Idea: project W on random effects δ such that

I Preserve spatial dependence implied by original WI δ is low-dimensionalI δ is less dependent (“cross-correlated”)I Project orthogonal to space spanned by X

I Applies to both Gaussian process and GMRF modelsI GMRF models: projection based on Moran operator which

uses neighborhood structure (Hughes and Haran, 2013)I GPs and GMRFs: general approach using

eigendecomposition (Guan and Haran, 2017)

Murali Haran, Penn State 17

Page 18: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Outline of Projection-based Approach

1. Fast approximation to the principal components of Σφ

I Approximate first m eigenvectors U = (u1, . . . ,um) and

eigenvalues Dm = diag(λ1, ..., λm)

2. Replace n-dimensional W with UD1/2m δ

δ: lower dimensional and ≈ independent

faster and better mixing MCMC algorithm

3. Project UD1/2m δ to C⊥(X )

Makes random effects orthogonal to fixed effects

handles confounding issues

4. Fit the reduced model under Bayesian framework

Murali Haran, Penn State 18

Page 19: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Step 1: EigendecompositionFor speed we use a fast approximate eigendecomposition

Left: deterministic approximation

Center: random approximation

Right: exact eigendecomposition

I Random projections used in Banerjee, Tokdar, Dunson

(2013); also Sarlos (2006), Halko et al. (2009)Murali Haran, Penn State 19

Page 20: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Step 2: Reducing Dimensions via Projection

I Approximates the leading m eigencomponents of the

covariance matrix Σφ

I Replace W with UD1/2m δ

Murali Haran, Penn State 20

Page 21: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Step 3: Projection to Handle Confounding

I Let P = X (X T X )−1X T , and P⊥ = I − P

I Recall: PW is in span of X, causes confounding

I Solution: Remove it

g{E(Z | β,W, σ2, φ)} = Xβ + W = Xβ + PW + P⊥W

[cf. Reich et al., 2006; Hughes and Haran, 2013]

Murali Haran, Penn State 21

Page 22: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Step 4: Inference Based on ReparameterizationI Spatial generalized linear mixed models

Usual: inference based on π(β, σ2, φ,W | Z)

I Obtain U,Dm of Σφ

I Dm is m-dim diagonal matrix with Dii = i th eigenvalueI FRP: replace W with UD1/2

m δ to approximate SGLMM or

RRP: replace W with P⊥UD1/2m δ to approximate restricted

spatial modelI Reduced Model:

g {E(Zi | β,U,Dm, δ)} = Xiβ + (P⊥UD1/2m )iδ

δ | . . . approx∼ Nm(0, σ2I)

Now: inference based on π(β, σ2, φ, δ | Z)

Murali Haran, Penn State 22

Page 23: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Reduced Correlations

I Reparameterized random effects are approximately

independent of each other and fixed effects

Murali Haran, Penn State 23

Page 24: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Computational Speed-up

I Drastic reduction in dimension of random effects, e.g. m =

50 for n = 1,000, or m = 60 for n = 3,000,...

I Reparameterized random effects are approximately

independent of each other and fixed effects

I Easy to construct fast-mixing MCMC algorithm

I Eg. 10 to 50 to 300-fold reduction in compute time

I Scale beyond n > 10,000?I computational cost is of order nm2

I discretization of space/pre-computingI new decomposition algorithms/parallelization

Murali Haran, Penn State 24

Page 25: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Prediction Study: Poisson SGLMMI Simulate n = 1000 spatial count dataI Prediction on 20 x 20 grid using rank = 50

FRP: full modelRRP: restricted model (orthogonalized random effects)

Murali Haran, Penn State 25

Page 26: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Summary of Projected SGLMMI reduces dimensions + better MCMC mixingI adjusts for spatial confoundingI simple to implement, mostly “automated”I good inference and prediction performance

I In the context of other reduced-rank approachesI Our approach does not result in exchangeability between

observed and predicted. Predictive process does.I But we use optimal (minimal truncation error) projectionI And prediction is still straightforward

I Other approachesI may be better for the basic linear modelI our approach works better for SGLMMsI our approach and predictive process approach: easy for

more complex hierarchical settings

Murali Haran, Penn State 26

Page 27: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Summary of Projected SGLMMI reduces dimensions + better MCMC mixingI adjusts for spatial confoundingI simple to implement, mostly “automated”I good inference and prediction performanceI In the context of other reduced-rank approaches

I Our approach does not result in exchangeability between

observed and predicted. Predictive process does.I But we use optimal (minimal truncation error) projectionI And prediction is still straightforward

I Other approachesI may be better for the basic linear modelI our approach works better for SGLMMsI our approach and predictive process approach: easy for

more complex hierarchical settingsMurali Haran, Penn State 26

Page 28: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Acknowledgments

I Yawen Guan, SAMSI/NC State

I John Hughes, U of Colorado-Denver

I Alan Gelfand (Duke U.)

I Peter Hoff (Duke U.)

I Bo Li (Purdue U.)

I Doug Nychka (NCAR)

I Dorit Hammerling (NCAR)

I Support from NSF-CDSE/DMS-1418090

Murali Haran, Penn State 27

Page 29: A Projection-based Approach for Modeling Non-Gaussian ... · Murali Haran, Penn State 3. House Finch Abundances Pardieck et al. 2015. North American Breeding Bird Survey Dataset 1966

Key References

I Guan and Haran (2017), A Computationally Efficient

Projection-Based Approach for Spatial Generalized Linear

Mixed Models, arxiv.orgI Hughes and Haran (2013), Dimension reduction and

alleviation... Journal of the Royal Statistical Society (B)I R package (CRAN) ngspatial

I Banerjee A, Tokdar, S., Dunson, D. (2013) Efficient

Gaussian process regression for large datasets, Biometrika

I Reich et al. (2006), Effects of residual smoothing on the

posterior of the fixed effects... Biometrics

I Haran (2011) Gaussian random field models for spatial

data, Handbook of MCMC

Murali Haran, Penn State 28