regularized horseshoe - github pages
TRANSCRIPT
![Page 1: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/1.jpg)
REGULARIZED HORSESHOE
Aki Vehtari
Aalto University, [email protected]
@avehtari
Joint work with Juho Piironen
1 / 24
![Page 2: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/2.jpg)
Outline
Large p, small n regressionPrior on weights vs. prior on shrinkageHorseshoe priorRegularized horseshoe prior
2 / 24
![Page 3: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/3.jpg)
Large p, small n regression
Linear or generalized linear regressionnumber of covariates pnumber of observations n
Large p, small n common e.g.in modern medical/bioinformatics studies (e.g. microarrays,GWAS)brain imagingin our examples p is around 1e2–1e5, and usually n < 100
3 / 24
![Page 4: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/4.jpg)
Large p, small n regression
If noiseless observations we can fit uniquely identifiedregression in n − 1 dimensions
If noisy observations, more complicatedIf correlating covariates, more complicated
4 / 24
![Page 5: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/5.jpg)
Large p, small n regression
If noiseless observations we can fit uniquely identifiedregression in n − 1 dimensionsIf noisy observations, more complicated
If correlating covariates, more complicated
4 / 24
![Page 6: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/6.jpg)
Large p, small n regression
If noiseless observations we can fit uniquely identifiedregression in n − 1 dimensionsIf noisy observations, more complicatedIf correlating covariates, more complicated
4 / 24
![Page 7: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/7.jpg)
Large p, small n regression
Priors!
Non-sparse priors assume most covariates are relevant, butmay have strong correlations→ factor models
Sparse priors assume only small number of covariateseffectively non-zero meff � n
5 / 24
![Page 8: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/8.jpg)
Large p, small n regression
Priors!Non-sparse priors assume most covariates are relevant, butmay have strong correlations→ factor models
Sparse priors assume only small number of covariateseffectively non-zero meff � n
5 / 24
![Page 9: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/9.jpg)
Large p, small n regression
Priors!Non-sparse priors assume most covariates are relevant, butmay have strong correlations→ factor models
Sparse priors assume only small number of covariateseffectively non-zero meff � n
5 / 24
![Page 10: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/10.jpg)
Example
(Intercept)
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
sigma
−2 −1 0 1 2
Gaussian prior
(Intercept)
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
sigma
−2 −1 0 1 2 3
Horseshoe prior
rstanarm + bayesplot
6 / 24
![Page 11: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/11.jpg)
Example
(Intercept)
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
sigma
−2 −1 0 1 2
Gaussian prior
(Intercept)
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
sigma
−2 −1 0 1 2 3
Horseshoe prior
rstanarm + bayesplot
6 / 24
![Page 12: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/12.jpg)
Example
Gaussian vs. Horseshoe predictive performance usingcross-validation (loo package, more in Friday Model selectiontutorial)
> compare ( loog , loohs )e l p d _ d i f f se
7.9 2.8
7 / 24
![Page 13: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/13.jpg)
Large p, small n regression
Sparse priors assume only small number of covariateseffectively non-zero meff � p
Laplace prior (“Bayesian lasso”)computationally convenient (continuous and log-concave), butnot really sparse
spike-and-slab (with point-mass at zero)prior on number of non-zero covariates, discrete
Horseshoe and hierarchical shrinkage priorsprior on amount of shrinkage, continuous
Carvalho et al (2009)8 / 24
![Page 14: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/14.jpg)
Prior on shrinkage
Slope of the prior at specific value determines the amount ofshrinkage
Carvalho et al (2009) 9 / 24
![Page 15: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/15.jpg)
Spike-and-slab vs horseshoe prior
Spike and slab prior (with point-mass at zero) has mix ofcontinuous prior and probability mass at zero
parameter space is mixture of continuous and discrete
Hierarchical shrinkage and horseshoe priors are continuous
Piironen and Vehtari (2017a)
10 / 24
![Page 16: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/16.jpg)
Horseshoe prior
Linear regression model with covariates x = (x1, . . . , xD)
yi = βTxi + εi , εi ∼ N(
0, σ2), i = 1, . . . ,n ,
The horseshoe prior:
βj |λj , τ ∼ N(
0, λ2j τ
2),
λj ∼ C+(0,1) , j = 1, . . . ,D.
The global parameter τ shrinks all βj towards zeroThe local parameters λj allow some βj to escape the
shrinkage
−2 0 20
0.15
0.3
0.45
β1
p(β1)
−2 0 2
−2
0
2
β1
β2
11 / 24
![Page 17: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/17.jpg)
Horseshoe prior
Linear regression model with covariates x = (x1, . . . , xD)
yi = βTxi + εi , εi ∼ N(
0, σ2), i = 1, . . . ,n ,
The horseshoe prior:
βj |λj , τ ∼ N(
0, λ2j τ
2),
λj ∼ C+(0,1) , j = 1, . . . ,D.
The global parameter τ shrinks all βj towards zeroThe local parameters λj allow some βj to escape the
shrinkage
−2 0 20
0.15
0.3
0.45
β1
p(β1)
−2 0 2
−2
0
2
β1
β2
11 / 24
![Page 18: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/18.jpg)
Horseshoe prior
Linear regression model with covariates x = (x1, . . . , xD)
yi = βTxi + εi , εi ∼ N(
0, σ2), i = 1, . . . ,n ,
The horseshoe prior:
βj |λj , τ ∼ N(
0, λ2j τ
2),
λj ∼ C+(0,1) , j = 1, . . . ,D.
The global parameter τ shrinks all βj towards zeroThe local parameters λj allow some βj to escape the
shrinkage
−2 0 20
0.15
0.3
0.45
β1
p(β1)
−2 0 2
−2
0
2
β1
β2
11 / 24
![Page 19: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/19.jpg)
Horseshoe prior
Linear regression model with covariates x = (x1, . . . , xD)
yi = βTxi + εi , εi ∼ N(
0, σ2), i = 1, . . . ,n ,
The horseshoe prior:
βj |λj , τ ∼ N(
0, λ2j τ
2),
λj ∼ C+(0,1) , j = 1, . . . ,D.
The global parameter τ shrinks all βj towards zeroThe local parameters λj allow some βj to escape the
shrinkage
−2 0 20
0.15
0.3
0.45
β1
p(β1)
−2 0 2
−2
0
2
β1
β2
11 / 24
![Page 20: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/20.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 21: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/21.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 1.0We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 22: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/22.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.9We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 23: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/23.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.8We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 24: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/24.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.7We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 25: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/25.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.6We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 26: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/26.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.5We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 27: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/27.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.4We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 28: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/28.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.3We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 29: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/29.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.2We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 30: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/30.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.1We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 31: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/31.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.1We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0
How to specify prior for τ?
12 / 24
![Page 32: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/32.jpg)
Horseshoe prior
Given the hyperparameters, the posterior mean satisfiesapproximately
β̄j = (1− κj)βMLj , κj =
11 + nσ−2τ2λ2
j,
where κj is the shrinkage factor
With λj ∼ C+(0,1), the prior for κj looks like:
0 0.5 1
κ
nσ−2τ 2 = 0.1We expect both
relevant (β̄j ≈ βMLj ) features
irrelevant (β̄j ≈ 0) features
Small τ ⇒ more coefficients ≈ 0How to specify prior for τ?
12 / 24
![Page 33: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/33.jpg)
The global shrinkage parameter τ
Effective number of nonzero coefficients
meff =D∑
j=1
(1− κj)
The prior mean can be shown to be
E[meff | τ, σ] =τσ−1√n
1 + τσ−1√
nD
Setting E[meff | τ, σ] = p0 (prior guess for the number ofnonzero coefficients) yields for τ
τ0 =p0
D − p0
σ√n
⇒ Prior guess for τ based on our beliefs about the sparsity
13 / 24
![Page 34: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/34.jpg)
The global shrinkage parameter τ
Effective number of nonzero coefficients
meff =D∑
j=1
(1− κj)
The prior mean can be shown to be
E[meff | τ, σ] =τσ−1√n
1 + τσ−1√
nD
Setting E[meff | τ, σ] = p0 (prior guess for the number ofnonzero coefficients) yields for τ
τ0 =p0
D − p0
σ√n
⇒ Prior guess for τ based on our beliefs about the sparsity
13 / 24
![Page 35: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/35.jpg)
The global shrinkage parameter τ
Effective number of nonzero coefficients
meff =D∑
j=1
(1− κj)
The prior mean can be shown to be
E[meff | τ, σ] =τσ−1√n
1 + τσ−1√
nD
Setting E[meff | τ, σ] = p0 (prior guess for the number ofnonzero coefficients) yields for τ
τ0 =p0
D − p0
σ√n
⇒ Prior guess for τ based on our beliefs about the sparsity13 / 24
![Page 36: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/36.jpg)
Illustration p(τ) vs. p(meff)
Let n = 100, σ = 1, p0 = 5, τ0 = p0D−p0
σ√n , D =
dimensionality
p(meff) with different choices of p(τ):
0 2 4 6 8 10
D = 10
τ = τ0
0 2 4 6 8 10
τ ∼ N+(0, τ 2
0
)
0 2 4 6 8 10
τ ∼ C+(0, τ 2
0
)
0 2 4 6 8 10
τ ∼ C+(0, 1)
0 5 10 15 20
D = 1000
0 5 10 15 20 0 20 40 60 80 100 0 200 400 600 8001,000
14 / 24
![Page 37: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/37.jpg)
Illustration p(τ) vs. p(meff)
Let n = 100, σ = 1, p0 = 5, τ0 = p0D−p0
σ√n , D =
dimensionality
p(meff) with different choices of p(τ):
0 2 4 6 8 10
D = 10
τ = τ0
0 2 4 6 8 10
τ ∼ N+(0, τ 2
0
)
0 2 4 6 8 10
τ ∼ C+(0, τ 2
0
)
0 2 4 6 8 10
τ ∼ C+(0, 1)
0 5 10 15 20
D = 1000
0 5 10 15 20 0 20 40 60 80 100 0 200 400 600 8001,000
14 / 24
![Page 38: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/38.jpg)
Non-Gaussian observation models
The reference value:
τ0 =p0
D − p0
σ√n
The framework can be applied also to non-Gaussianobservation models by deriving appropriate plug-in valuesfor σ
Gaussian approximation to the likelihoodE.g. σ = 2 for logistic regression
15 / 24
![Page 39: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/39.jpg)
Regularized horseshoe
HS allows some coefficients to be completely unregularizedallows complete separation in logistic model with n� p
Regularized horseshoe adds additional wide slabmaintains division to relevant and non-relevant variables
16 / 24
![Page 40: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/40.jpg)
Regularized horseshoe
HS allows some coefficients to be completely unregularizedallows complete separation in logistic model with n� p
Regularized horseshoe adds additional wide slabmaintains division to relevant and non-relevant variables
16 / 24
![Page 41: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/41.jpg)
Horseshoe vs regularized horseshoe
Regularized horseshoe helps regularize relevant variables
17 / 24
![Page 42: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/42.jpg)
Regularized horseshoe in rstanarm
Easy in rstanarm (thanks to Ben Goodrich)p0 <- 5tau0 <- p0/(D-p0) * sigmaguess/sqrt(n)
fit <- stan_glm(y ∼ x, gaussian(), hs(global_scale=tau0, slab_scale=2.5,
slab_df=4))
Note: rstanarm does not condition on σ, and thus need toscale tau0 with a guess of expected value of σ
luckily the result is not sensitive to the exact value
Note 2: hs() prior is called “hierarchical shrinkage” prior, asit is extension of Horseshoe (Horseshoe has local_df=1)
luckily the result is not sensitive to the exact value
18 / 24
![Page 43: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/43.jpg)
Regularized horseshoe in rstanarm
Easy in rstanarm (thanks to Ben Goodrich)p0 <- 5tau0 <- p0/(D-p0) * sigmaguess/sqrt(n)
fit <- stan_glm(y ∼ x, gaussian(), hs(global_scale=tau0, slab_scale=2.5,
slab_df=4))
Note: rstanarm does not condition on σ, and thus need toscale tau0 with a guess of expected value of σ
luckily the result is not sensitive to the exact valueNote 2: hs() prior is called “hierarchical shrinkage” prior, asit is extension of Horseshoe (Horseshoe has local_df=1)
luckily the result is not sensitive to the exact value
18 / 24
![Page 44: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/44.jpg)
Regularized horseshoe in rstanarm
Simulated regression examplen = 100, p = 200, true p0 = 7Gaussian vs. “Bayesian LASSO” vs. Reg. Horseshoe
19 / 24
![Page 45: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/45.jpg)
Regularized horseshoe in rstanarm
Simulated regression examplen = 100, p = 200, true p0 = 7Gaussian vs. “Bayesian LASSO” vs. Reg. Horseshoe
19 / 24
![Page 46: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/46.jpg)
Regularized horseshoe in rstanarm
Simulated regression examplen = 100, p = 200, true p0 = 7Gaussian vs. “Bayesian LASSO” vs. Reg. Horseshoe
19 / 24
![Page 47: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/47.jpg)
Regularized horseshoe in rstanarm
Simulated regression examplen = 100, p = 200, true p0 = 7Gaussian vs. “Bayesian LASSO” vs. Reg. Horseshoe
19 / 24
![Page 48: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/48.jpg)
Regularized horseshoe in rstanarm
Simulated regression examplen = 100, p = 200, true p0 = 7
20 / 24
![Page 49: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/49.jpg)
Experiments
Table: Summary of the real world datasets, D denotes thenumber of predictors and n the dataset size.
Dataset Type D n
Ovarian Classification 1536 54Colon Classification 2000 62Prostate Classification 5966 102ALLAML Classification 7129 72
21 / 24
![Page 50: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/50.jpg)
Horseshoe vs regularized horseshoe
Regularized horseshoe helps to reduce the number ofdivergences, too
22 / 24
![Page 51: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/51.jpg)
Horseshoe vs regularized horseshoe
Regularized horseshoe helps to reduce the number ofdivergences, too
22 / 24
![Page 52: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/52.jpg)
Example
(Intercept)
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
x16
x17
x18
x19
x20
sigma
−2 −1 0 1 2 3
Even if Horseshoe shrinks a lot, coefficient posterior hasunecrtainty and it’s not exactly zeroTomorrow in Model selection tutorial
how to select most relevant variableshow to do the inference after the selection while taking intoaccount the uncertainties in the full model
23 / 24
![Page 53: Regularized horseshoe - GitHub Pages](https://reader036.vdocuments.us/reader036/viewer/2022070517/62c229934980b43c9d3ccfd5/html5/thumbnails/53.jpg)
Summary of regularized horseshoe prior
Sparse as horseshoe, but more robust inference andcomputationBetter performance than LASSO and Bayesian LASSO
References (with code examples for Stan included)
Juho Piironen and Aki Vehtari (2017). Sparsity information andregularization in the horseshoe and other shrinkage priors. In ElectronicJournal of Statistics, 11(2):5018-5051.https://projecteuclid.org/euclid.ejs/1513306866Juho Piironen and Aki Vehtari (2017). On the hyperprior choice for theglobal shrinkage parameter in the horseshoe prior. Proceedings of the20th International Conference on Artificial Intelligence and Statistics,PMLR 54:905-913. http://proceedings.mlr.press/v54/piironen17a.htmlJuho Piironen, and Aki Vehtari (2018). Iterative supervised principalcomponents. Proceedings of the 21th International Conference onArtificial Intelligence and Statistics, accepted for publication.https://arxiv.org/abs/1710.06229See also model selection tutorial with some notebooks using regularizedhorseshoe https://github.com/avehtari/modelselection_tutorial
24 / 24