spatial regression modelling for large dataset: a precompression … · 2019-06-24 · open data...

45
Spatial regression modelling for large dataset: A precompression approach Daisuke Murakami The Institute of Statistical Mathematics Joint work with Daniel A. Griffith 1

Upload: others

Post on 05-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Spatial regression modelling for large dataset: A precompression approach

Daisuke Murakami

The Institute of Statistical Mathematics

Joint work with Daniel A. Griffith1

Page 2: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Outline• Objective

‒ Development of a fast regression approach for large spatial data

• Outline‒ Introduction of a fast additive modeling (AM) for large samples,

which is implemented in an R package mgcv

‒ Development of another fast additive model for large spatial data

‒ Comparison through Monte Carlo simulations

‒ Application to a crime data

2

mgcv: developed byDr Wood (University of Bristol)

Page 3: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Spatial data are getting bigger and bigger

Participatory sensingPeople flow, Health, Tweet

Energy consumption ,...

Ground temperature

Air pollution

Remote sensingClimate, Temperature, Land cover,…

Mobile GPS

Google Earth Engine(https://earthengine.google.com/)

Estimated data- Population, Productivity,

Barth/death counts,…

例:WorldPop(http://www.worldpop.org.uk/)Global socioeconomic Statistics by 1 km grids

Increase ofopen data

Traffic counts

OpenStreetMap- Road, Buildings,…

Page 4: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Regression for large samples

Regression problems containing from tens ofthousands to millions of observations are nowcommonplace (Wood et al., 2015).

• Additive (mixed) model (AM) is useful✓Regression accounting for linear, non-linear, group,

and other effects. ✓Fast estimation methods have been developed

• Review✓AM in applied statistics✓AM-related models in geostatistics

4

Page 5: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Additive model (AM)

• Linear AM

𝐲 = 𝐗𝛃 +

𝑘=1

𝐾

𝐟(𝐳𝑘) + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)

𝐟 𝐳𝑘 : Unknown smooth function representing the effectfrom a covariate 𝐳𝑘

𝐄𝑘 : Known matrix consists of L (<<N) basis functions

𝛄𝑘 : Random coefficients with known covariance matrix 𝐒𝑘

✓ One variance parameter 𝜏𝑘2 for each 𝐟 𝐳𝑘

𝐟(𝐳𝑘) = 𝐄𝑘𝛄𝑘 𝛄𝑘~𝑁(𝟎, 𝜏𝑘2𝐒𝑘)

5

Page 6: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Additive model (AM) is useful

Time-varying effects- Electricity use prediction

Space-varying effects- Local impact of

racial diversityon crime risks

• To estimate a wide variety of effects behind data.

zk : view

Nonlinear effects- Effects of openness of

view on hosing price

𝐲 = 𝐗𝛃 +

𝑘=1

𝐾

𝐟 𝐳𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)

𝐟 𝐳𝑘 = 𝐄𝑘𝛄𝑘 𝛄𝑘~𝑁(𝟎, 𝜏𝑘2𝐒𝑘)

Time: zk

𝐳𝑘: Location

𝐟 𝐳𝑘

𝐟 𝐳𝑘

𝐟 𝐳𝑘

Page 7: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Estimation of the variance parameters {𝜏12, ⋯ , 𝜏𝐾

2 }

Log-restricted likelihood (REML)

7

𝐲 − ෩𝐗෩𝛃2+ ෩𝛃′ ෨𝐒𝛕

−1෩𝛃

2𝜎2+𝑁 −𝑀

2log 2π𝜎2 +

𝑙𝑜𝑔 ෩𝐗′෩𝐗 + ෨𝐒𝛕−1 − 𝑙𝑜𝑔 ෨𝐒𝛕

−1

2

Red: Matrices and vectors whose size depend on N

෩𝐗 = [𝐗, 𝐄1, ⋯ , 𝐄𝐾]𝐲 = ෩𝐗෨𝛃+ 𝛆 ෩𝛃 = [𝛃′, 𝛄′1, ⋯ , 𝛄′𝐾]′

𝐲 = 𝐗𝛃 +

𝑘=1

𝐾

𝐄𝑘𝛄𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)𝛄𝑘~𝑁(𝟎, 𝜏𝑘2𝐒𝑘)

෨𝐒𝛕: A Block diagonal matrix whose k-th block equals 𝜏𝑘2𝐒𝑘

෩𝛃 = (෩𝐗′෩𝐗 + ෨𝐒𝛕−1)−1෩𝐗′𝐲

Page 8: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Fast REML (Wood et al., 2011; 2015; 2017)

• Applicable to Gigadata (Wood et al., 2015)‒ Fast : Linear-time estimation of {𝜏1

2, ⋯ , 𝜏𝐾2 }

‒ Small memory : There are memory efficient procedures

• Fast REML‒ Once ෩𝐗 is decomposed as ෩𝐗 = QR, the CP cost to estimate the

parameters {𝜏12, ⋯ , 𝜏𝐾

2 } is independent of N.‒ The large Q matrix can be discarded before the estimation

𝐟 − 𝐑෩𝛃𝟐+ 𝐫 𝟐 + ෩𝛃′ ෨𝐒𝛕

−1෩𝛃

2𝜎2+𝑁 − 𝐾𝐿

2log 2π𝜎2

+𝑙𝑜𝑔 𝐑′𝐑 + ෨𝐒𝛕

−1 − 𝑙𝑜𝑔 ෨𝐒𝛕−1

2

෩𝛃 = (𝐑′𝐑 + ෨𝐒𝛕−1)−1𝐑′𝐟

f = 𝐐′𝐲, r = 𝐲 𝟐 − 𝐫 𝟐8

Page 9: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

An R package mgcv

• Implements (generalized) AM accounting for a wide variety of effects✓Linear, Non-linear, varying coefficients, group,…

• Computationally really efficient✓bam function: fast and memory efficient estimation for Big data

(fast REML is the default)

✓As far as I reviewed R packages for fast regression modeling (e.g., INLA, R2BayesX, RStan,…), mgcv was the fastest.

Later, I will compare my algorithm with the fast REML

9

Page 10: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Regression for large samples

Regression problems containing from tens ofthousands to millions of observations are nowcommonplace (Wood+2015, Appl. Stat.).

• Additive mixed model (AM) is useful✓Regression accounting for linear, non-linear, group,

and other effects. ✓Fast estimation methods have been developed

• Review✓AM in applied statistics✓AM-related models in geostatistics

10

Page 11: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Spatial correlation

The most basic property

of spatial data- The first law of geography

(Tobler 1970)

Nearby things arestrongly related each other

R. A. Fisher (1935)

11

Page 12: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Modeling spatial correlation

𝛃~𝑁 𝑏𝟏, 𝜏2𝐂𝑟 𝐂𝑟 =

𝑐𝑟(𝑑1,1) ⋯ 𝑐𝑟(𝑑1,𝑁)

⋮ ⋱ ⋮𝑐𝑟(𝑑𝑁,1) ⋯ 𝑐𝑟(𝑑𝑁,𝑁)

Distance di,jr

𝑐𝑟(𝑑𝑖,𝑗) is a

distance-decay function

Correlation

• Gaussian process (GP) is widely used.

𝑐𝑟(𝑑𝑖,𝑗) = exp(−𝑑𝑖,𝑗

𝑟)

Spatially correlated process β behind data

Processbehindairpollution

Page 13: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Spatially varying coefficient (SVC) model

– A particular type of AM in geostatistics estimating spatial correlated process behind regression coefficients

13

𝐲 =

𝑘=1

𝐾

𝐱𝑘°𝛃𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)

𝛃𝑘~𝑁(𝑏𝑘𝟏, 𝜏𝑘2𝐂𝑟𝑘)

Processesbehind the coefficients

Spatial pattern of β1 Spatial pattern of β2

Small 𝑟1 Large 𝑟2

Page 14: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Geostatistical approach is too slow

• Geostatistical approaches accurately estimate

SVCs (i.e., 𝛃𝑘), but too slow… not suitable for large samples.

14

20,000 60,000 100,000

Sample size

CP time(Seconds) CP time of SVC models

Page 15: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

The fast REML in mgcv is available to the SVC modeling

‒ Because the SVC model is a particular type of AM.

Geo-additive model (GeoAM; Kammann and Wand, 2003)

𝐲 =

𝑘=1

𝐾

𝐱𝑘°𝛃𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)

𝛄𝑘~𝑁(0, 𝜏𝑘2𝚲𝑘)𝛃𝑘 = 𝑏𝑘𝟏 + 𝐄𝛄𝑘

𝐲 = 𝐗𝛃 + 𝛆𝐗 = [𝐱1, ⋯ , 𝐱𝐾, (𝐱1°𝐄),⋯ , (𝐱𝐾°𝐄)]

𝛃 = [𝑏1, ⋯ , 𝑏𝐾 , 𝛄′1, ⋯ , 𝛄′𝐾]′

𝛃𝑘~𝑁(𝑏𝑘𝟏, 𝜏𝑘2 𝐂) 𝐂 = ณ𝐄

𝑁×𝐿

ด𝚲𝑘

𝐿×𝐿

ณ𝐄 ′𝐿×𝑁

Rank reduced GP (scale r is given a priori)

𝐄: Matrix composed of L basis functions

15

Page 16: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Remining computational issue• SVC model has more parameters than typical AM

– AM : 𝚯 ∈ {𝜏12, ⋯ , 𝜏𝐾

2}, (mgcv and other AM studies)

– SVC model: 𝚯 ∈ 𝜏12, ⋯ , 𝜏𝐾

2 , 𝑟1, ⋯ , 𝑟𝐾 . (ours)

• So, red parts in the restricted likelihood might be slow

16

𝐟 − 𝐑𝛃2+ 𝐫 2 + 𝛃′𝐒𝚯𝛃

2𝜎2+𝑁 −𝑀

2log 2π𝜎2

+𝑙𝑜𝑔 𝐑′𝐑 + 𝐒𝚯 − 𝑙𝑜𝑔 𝐒𝚯

2

𝛃 = (𝐑′𝐑 + 𝐒𝚯)−1𝐑′𝐟

di,j

𝑟𝑘(decay speed)

Spatial covariance

𝜏𝑘2 𝑐 −𝑑𝑖,𝑗; 𝑟𝐾𝜏𝑘

2

Page 17: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Objective• Summary

‒ Geostatistical methods are too slow.

‒ AM (mgcv) in applied statistics is much faster, but might be slow for the SVC modeling because of the many variance parameters.

• I developed another fast REML for SVC modeling and other AM‒ Applicable to AM with SVCs even if

✓N (sample size) is large (e.g., millions)

✓K (number of SVCs and other effects) is large

‒ Note: my development is done independently with the Wood’s fast REML.✓Simply because I didn’t know his study …

17

Page 18: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Our SVC model

– Develops a fast approach to estimate 𝚯 ∈ {𝜏1

2, ⋯ , 𝜏𝐾2 , 𝑟1, ⋯ , 𝑟𝐾}.

18

Pattern Behind the coefficients

Spatial pattern of β1 Spatial pattern of β2

𝐲 =

𝑘=1

𝐾

𝐱𝑘°𝛃𝑘 + 𝛆 𝛆~𝑁(0, 𝜎2𝐈)

𝛄𝑘~𝑁(0, 𝜏𝑘2𝚲𝑟𝑘)𝛃𝑘 = 𝑏𝑘𝟏 + 𝐄𝛄𝑘

Rank reduced GP (rank: L)

Small 𝑟1 Large 𝑟2

Page 19: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Type II restricted log-likelihood (see Bates, 2010)

𝑙𝑅(𝚯) = −1

2𝑙𝑛

𝐗′𝐗 𝐗′ ෨𝐄෩𝐕(𝚯)෩𝐕 𝚯 ෨𝐄′𝐗 ෩𝐕 𝚯 ෨𝐄′ ෨𝐄෩𝐕(𝚯) + 𝐈

−𝑁−𝐾

21 + 𝑙𝑛

2𝜋𝑑(𝚯)

𝑁−𝐾

መ𝐛𝐮

=𝐗′𝐗 𝐗′ ෨𝐄෩𝐕(𝚯)

෩𝐕 𝚯 ෨𝐄′𝐗 ෩𝐕 𝚯 ෨𝐄′ ෨𝐄෩𝐕(𝚯) + 𝐈

−1𝐗′𝐲

෩𝐕 𝚯 ෨𝐄′𝐲

𝑑 𝚯 = 𝐲 − 𝐗መ𝐛 − ෨𝐄෩𝐕(𝚯)𝐮2+ 𝐮

2

ො𝜎2 =𝐲 − 𝐗𝐛 − ෨𝐄෩𝐕 𝚯 𝐮

2

𝑁 − 𝐾Red: Matrix/vector whose size depend on N

Accuracy Variance

𝐲 = 𝐗𝛃 + ෨𝐄෩𝐕 𝚯 𝐮 + 𝛆 𝛆~𝑁(𝟎, 𝜎2𝐈)𝐮~𝑁(𝟎, 𝜎2𝐈)

• Our SVC model

• Our restricted log-likelihood

෨𝐄 = [𝐄, 𝐄,⋯ , 𝐄]: Matrix of all the basis functions (N×KL)෩𝐕 𝚯 : Diagonal matrix determining the variance structure

Page 20: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Eliminating N from 𝑙𝑅(𝚯) (Similar to Wood et al., 2015)

(1) Evaluate MXX=X'X, MXE=X'E, mXy=X'y, mEy=E'y, myy=y'y

(2) Rewrite 𝑙𝑅(𝚯) as below

→Large matrices/vectors are eliminated

→Complexity: O((K+KL)3) << O(N3)

𝑙𝑅(𝚯) = −1

2𝑙𝑛

𝐌𝐗𝐗 𝐌𝐗𝐄෩𝐕(𝚯)

෩𝐕 𝚯 𝐌′𝐗𝐄 ෩𝐕 𝚯 𝐌𝐄𝐄෩𝐕(𝚯) + 𝐈

−𝑁−𝐾

21 + 𝑙𝑛

2𝜋𝑑(𝚯)

𝑁−𝐾

𝑑 𝚯 = ො𝜺 2 + 𝐮2

ො𝜺 2 = 𝑚𝐲𝐲 − 2 𝐛′, 𝐮′𝐦𝐗𝐲

෩𝐕 𝚯 𝐦𝐄𝐲+ 𝐛′, 𝐮′

𝐌𝐗𝐗 𝐌𝐗𝐄෩𝐕(𝚯)

෩𝐕 𝚯 𝐌′𝐗𝐄 ෩𝐕 𝚯 𝐌𝐄𝐄෩𝐕(𝚯)

መ𝐛𝐮

መ𝐛𝐮

=𝐌𝐗𝐗 𝐌𝐗𝐄

෩𝐕(𝚯)෩𝐕 𝚯 𝐌′𝐗𝐄 ෩𝐕 𝚯 𝐌𝐄𝐄

෩𝐕(𝚯) + 𝐈

−1 𝐦𝐗𝐲

෩𝐕 𝚯 𝐦𝐄𝐲20

Page 21: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Fast maximization of 𝑙𝑅(𝚯)Still, the maximization of 𝑙𝑅(𝚯) is slow if K is large.

- Θ∈{𝜏12, ⋯ , 𝜏𝐾

2 , 𝑟1, ⋯ , 𝑟𝐾}

- It involves P(Θ)-1 and |P(Θ)| where P(Θ) (KL×KL)

𝑙𝑅(𝚯) = −1

2𝑙𝑛

𝐌𝐗𝐗 𝐌𝐗𝐄෩𝐕(𝚯)

෩𝐕 𝚯 𝐌′𝐗𝐄 ෩𝐕 𝚯 𝐌𝐄𝐄෩𝐕(𝚯) + 𝐈

−𝑁−𝐾

21 + 𝑙𝑛

2𝜋𝑑(𝚯)

𝑁−𝐾

𝐏 𝚯

It includes 𝐏 𝚯 −1

max 𝑙𝑅 θ1 |θ2,⋯,θK → Partial update of 𝐏 𝚯 −1 and 𝐏 𝚯

max 𝑙𝑅 θ2 |θ1,⋯,θK → Partial update of 𝐏 𝚯 −1 and 𝐏 𝚯

max 𝑙𝑅 θK |θ1,⋯,θK−1 → Partial update of 𝐏 𝚯 −1 and 𝐏 𝚯

・・・

We apply a sequential update (𝛉𝑘 ∈ {𝜏𝑘2, 𝑟𝑘})

Page 22: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

K-th step of the sequential updates

• 𝑙𝑅 θK is maximized with respect to 𝛉𝐾 ∈ {𝜏𝐾2 , 𝑟𝑘}.

‒ We let 𝛉𝐾 outside of large matrices to enable partial update.

𝑙𝑅(θK) = −1

2𝑙𝑛 𝐏(θK) −

𝑁−𝐾

21 + 𝑙𝑛

2𝜋𝑑(θK)𝑁−𝐾

𝑑 θK = ො𝛆 θK2 +

𝑘=1

𝐾

ෝ𝐮𝑘2

ො𝛆 θK2 = 𝑚𝑦,𝑦 − 2 መ𝐛′, ෝ𝐮′1, ⋯ ෝ𝐮′𝐾

𝐦0

𝐕1𝐦1

⋮𝐕 𝛉𝐾 𝐦𝐾

+ መ𝐛′, ෝ𝐮′1, ⋯ ෝ𝐮′𝐾 𝐏0

መ𝐛ෝ𝐮1⋮ෝ𝐮𝐾

መ𝐛ෝ𝐮1⋮ෝ𝐮𝐾

= 𝐏(θK)−1

𝐦0

𝐕1𝐦1

⋮𝐕 𝛉𝐾 𝐦𝐾 22

Page 23: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

K-th step of the sequential updates

23

• The terms including 𝐏(θK)−1 and 𝐏(𝚯) are expanded as

መ𝐛ෝ𝐮1⋮ෝ𝐮𝐾

=෩𝐕−𝐾−1 𝐎

𝐎 )𝐕(𝛉𝐾−1 𝐐−1 𝐦−𝐾

𝐦𝐾

−෩𝐕−𝐾−1𝐐−𝐾,𝐾

)𝐕(𝛉𝐾−1𝐐𝐾,𝐾

∗)𝐕(𝛉𝐾2 +𝐐𝐾,𝐾

∗ −1𝐐𝐾,−𝐾∗ 𝐦−𝐾 + 𝐐𝐾,𝐾

∗ 𝐦𝐾

= ෩𝐕−𝐾2

)𝐕(𝛉𝐾2 ෩𝐌−𝐾,−𝐾 + ෩𝐕−𝐾

−2 ቚ

)𝐕(𝛉𝐾−2 +𝐌𝐾,𝐾

− ෩𝐌𝐾,−𝐾෩𝐌−𝐾,−𝐾 + ෩𝐕−𝐾

−2 −1 ෩𝐌−𝐾,𝐾

𝐏(𝚯)

• Large matrices (shown in red) are processed only one time before the iteration to maximize 𝑙𝑅(θK).

Page 24: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Does the sequential estimation works appropriately?

• The lowest correlation coefficients between SVCs estimated by the sequential and simultaneous optimization in the 200 simulations✓The sequential estimation returns almost the same SVCs

with the simultaneous one.

24

Page 25: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Summary: fast estimation approach• Usual: N samples are processed iteratively in the

estimation‒ Not suitable for large N.

• Ours: N samples are compressed a priori.‒ The CP cost for the estimation is independent of N.

→ Iteration of O(N3) → O(N) + Iteration of O(K3L3)

‒ The estimation of Θ is split into K steps.→ Iteration of O(K3L3) → O(K3L3) + Iteration of O(L3)

Data Model

Iteration for estimation(e.g.,MCMC)

Iteration of O(N3)

in case of GP

N

(K+L)2 + K+L

Model

Split the estimation

into K steps: O(L3)

Rank reduction + Precompression

25

Page 26: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Summary of the proposed approach

• An estimation approach for the SVC model is proposed✓ Pre-compression: Iteration of O(N3) → O(N) + Iteration of O(K3L3)

✓ Sequential est. : Iteration of O(K3L3) → O(K3L3) + Iteration of O(L3)

• Although I assumed the SVC model below, this approach is applicable to other AM (and additive mixed model)

✓ Useful to estimate regression models with SVCs, group effects,

non-linear effects, time-varying effects,…

𝐲 =

𝑘=1

𝐾

𝐱𝑘°𝛃𝑘 + 𝛆 𝛆~𝑁(𝟎, 𝜎2𝐈)

𝛄𝑘~𝑁(𝟎, 𝜏𝑘2𝚲𝛼𝑘)𝛃𝑘 = 𝑏𝑘𝟏 + 𝐄𝛄𝑘

26

Page 27: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Simulation

• In each case, the true data is generated from

𝐲 =

𝑘=1

2

𝐱𝑘°𝛃𝑘 +

𝑘=3

4

𝐱𝑘°𝛃𝑘 + 𝛆𝑔 + 𝛆𝛆~𝑁(𝟎, 𝜎2𝐈)

Global

SVC𝛃𝑘 = 1 + 𝐄𝛄𝑘𝛄𝑘~𝑁(𝟎, 𝚲𝑘

3 )

𝛃𝑘 = 1 + 𝐄𝛄𝑘

𝛄𝑘~𝑁(𝟎, 𝚲𝑘0.5)

Local

SVC

•Models are estimated 200 times in each case, where✓N∈{500, 1,000, 3,000, 8,000, 20,000}

✓K∈{4, 6, 8}

✓σ𝑔2 =Var[σ𝑘=1

𝐾 𝐱𝑘°𝛃𝑘]

When K=4

𝛃1 𝛃2 𝛃5 𝛃6

𝛆𝑔~𝑁(𝟎, σ𝑔2𝐆)

Group

effect

Coefficients by

randomly

generated

groups of 20

samples

Page 28: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Specifications

28

𝐲 =

𝑘=1

𝐾

𝐱𝑘°𝛃𝑘 + 𝛆𝑔 + 𝛆

𝑓(𝐳𝑗) = 𝐄𝛄

𝛆~𝑁(0, 𝜎2𝐈)

Approach

SVCs: βk Group

effects

εg

R func-

tionModelScale

estimation

AM 2D cubic spline ×

bam

(mgcv)GeoAM Low rank GP

Spatial range is

fixed following

studies in

applied statistics

×

Propose1 Low rank GP ×resf_vc

(spmoran)Propose2 Low rank GP × ×

28

Page 29: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

RMSE of the estimated parameters

29

β (local)

β (global)

β (group)

K = 4 K = 6 K = 8AM

GeoAM

Propose 1

Propose 2

Page 30: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Estimated SVCs in the 1st iteration(N=8,000; K=6)

30

β4 (local)

β2 (global)

AM GeoAM Proposed2True

Bivariate spline

is too simple

GP with pre-specified

scale parameter

is not acceptable

Page 31: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Estimated group effects in the 1st iteration (K=6)

31

AM GeoAM Proposed2

True True True

True True True

Estimate Estimate Estimate

Estimate Estimate Estimate

N=8,000(400 groups)

N=20,000(1000 groups)

Too much shrinkage? SVC estimation error might the group effect blur 31

Page 32: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Computation time (𝑁 ≤ 200,000)

32

Sample size

GeoAM(mgcv)

Propose 2

Number of variance parameters

CP time (seconds) CP time (seconds)

If N and the number of SVCs are the same, - Propose 2 is slower than GeoAM. But, when N >10,000, the increase of

CP time with respect to N is as slow as GeoAM

If N and the number of variance parameters are the same, - Propose 2 is faster than GeoAM (N is large).

(N=200,000)(K=8)

Page 33: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

CP time comparison with other geostatistical approaches

•Conventional✓The CP time rapidly increases as N grow✓Did not work when N > 15,000

•Proposed✓CP time is very short even if N and K are large.

20,000 60,000 100,000 Sample size

CP time (seconds)

←Practical method

Proposed

Bayesian SVC model (our model approximates it)

Page 34: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Application to crime data

Joint work with Mami Kajita and Seiji Kajita

34

Page 35: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Crime analysis is a hot topic

PredPol (https://www.predpol.com/)

• Predict when and where crimes happens‒ Considering streetlight location, bar opening hours,…

• Optimization of patrol Crimes decrease 20% in Santa Cruz, CA, whereas 47 % in LA

Crime prediction Patrol control 35

Page 36: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Number of crimes in Tokyo

Brutal

Violent

2009 2012 2015 2017

2009 2012 2015 2017

Counts per area

36

Page 37: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Burglary

Non-burglary

2009 2012 2015 2017

2009 2012 2015 2017

37

Number of crimes in Tokyo

Counts per area

Page 38: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Determinants of crimes

Near-repeated crimes in Tokyo

Repeat-ness‒ Crimes tent to repeat in the

same area

Risk factors‒ Education, Income, Population,

Number of pedestrians,…

Regional properties‒ Slam, image of

neighborhood,…

38

Page 39: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Our applied additive mixed model with SVCs

𝑦𝑐 𝐬, 𝑡 =

𝑐′=1

4

𝑦𝑐′ 𝐬, 𝑡 − 1 𝑏𝑐′(𝐬) +

𝑘=1

𝐾

𝑥𝑘(𝐬, 𝑡)𝛽𝑘(𝐬) + 𝑢𝑠𝑝(𝐬) + 𝑢𝑛𝑠(𝐬) + ε(𝐬, 𝑡)

𝐮𝑛𝑠~𝑁(0, σ𝑛𝑠2 𝐈)

𝐛𝑐′~𝑁(𝟎, 𝐂(𝛉𝑐′)) 𝛃𝑘~𝑁(𝟎, 𝐂(𝛉𝑘)) 𝐮𝑠𝑝~𝑁(𝟎, 𝐂(𝛉𝑠𝑝))Spatially

varying

coefficients

(GP)Independent

Normal

(group effect)

Repeat-ness Risk factors

Local effect

Spatial District

Log(Crime density)- c: Crime : Brutal, Violence, Burglary, Non-burglary

- s: District : 3,128

- t: Year : 2009 – 2016

39

Page 40: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Application to crime analysis

Brutal

Violent

Burglary

Non-burglary

Explained variables- Number of crimes

per area (Log-scale)

Explanatory variables

40

Repeat-ness

Crime density in the previous year

Risk factors

Daytime pedestrian density

Nighttime pedestrian density

Unemployment rate

(5 other variables)

Local effects

District-level effect

Spatial effect

. . .

Page 41: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Estimated District and Spatial effects

Brutal Violent Burglary Non-burglaryRepresentative clouded district

(Kabuki-cho)

High risk in clouded district

High risks in the north area

High risk in the centerHigh risk in the centerand north west area

High risk in north area High risk in the centerand north area

District-level (group) effect

Spatial effects

Page 42: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Estimated Repeat-ness effects

Tend to repeat near the center

42

Brutal Violent Burglary Non-burglary

Tend to repeat near sub-centers(Shinjuku, Shibuya)

Page 43: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Estimated effects of Daytime pedestrian density

Increase crimes in suburban areas

Increase crimes in suburban areas

Estimated effects

Statistical significance

Increase in sub-centers

43

Brutal Violent Burglary Non-burglary

No impact

Page 44: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Estimated effects of Nighttime pedestrian density

Increase in clouded area

No impact Low density increase crimes in a bayside area

Statistical significance

High density increase crimes in local centersin the north

Estimated effects

Brutal Violent Burglary Non-burglary

Page 45: Spatial regression modelling for large dataset: A precompression … · 2019-06-24 · open data Traffic counts OpenStreetMap-Road, Buildings, ... bam function: fast and memory efficient

Concluding remarks

• This study develops a fast SVC modeling approach for large samples✓It estimates SVCs accurately

✓It is confirmed that it is as fast as the fast REML in mgcv

• The SVC model is implemented in an R package spmoran

• This approach is available to other additive mixed (mixed) models✓Non-linear, time-varying, group, …

✓It might be possible to extend it to fast Bayesian sampling whose CP cost is independent of the sample size (after pre-compression)

45