the horseshoe estimator for sparse signals carlos m. carvalho nicholas g. polson james g. scott...
Post on 27-Dec-2015
218 Views
Preview:
TRANSCRIPT
The horseshoe estimator for sparse signals
CARLOS M. CARVALHO
NICHOLAS G. POLSON
JAMES G. SCOTT
Biometrika (2010)
Presented by Eric Wang
10/14/2010
Overview
• This paper proposes a highly analytically tractable horseshoe estimator that is more robust and adaptive to different sparsity patterns than existing approaches.
• Two theorems are proved characterizing the proposed estimator’s tail robustness and demonstrating super-efficient rate of convergence to the correct estimate of the sampling density in sparse situation.
• The proposed estimator’s performance is demonstrated using both real and simulated data. The authors show its answer correspond quite closely to those obtained by Bayesian model averaging.
• Consider a p-dimensional vector where is sparse, the authors propose the following model for estimation and prediction:
where is a standard half-Cauchy distribution with mean 0 and scale parameter a.
• The name horseshoe prior arises from the observation that, for fixed values
where and is the amount of shrinkage toward zero, a posteriori. has a horseshoe shaped prior .
The horseshoe estimator
The horseshoe estimator
• The meaning of is as follows: yields virtually no shrinkage, and describes signals while yields near total shrinkage and (hopefully) describes noise.
•At right is the prior on the shrinkage coefficient .
The horseshoe density function
• An analytic density function lacks an analytic form, but very tight bounds are available:
Theorem 1. The univariate horseshoe density satisfies the following: (a) (b) For
where
• Alternatively, it is possible to integrate over yielding
though the dependence among causes more issues. Therefore the authors do not take this approach.
Review of similar methods
• Scott & Berger (2006) studied the discrete mixture
where
• Tipping (2001) studied the Student-t prior is defined by an inverse-gamma mixing density,
• The double-exponential prior (Bayesian lasso) has mixing density
Review of similar methods• The normal-Jeffreys prior is an improper prior and is induced
by placing the Jeffreys’ prior on each variance term
leading to . This choice is commonly used in the absence of a global scale parameter.
• The Strawderman-Berger prior does not have an analytic form, but arises from assuming , with
• The normal-exponential-gamma family of priors generalizes the lasso specification using to mix over the exponential rate parameter, leading to
Robustness to large signals
• Theorem 2. Let be the likelihood, and suppose that is a zero-mean scale mixture of normals: with having proper prior . Assume further that the likelihood and are such that the marginal density is finite for all . Define the following three pseudo-densities, which may be improper:
Then
• If is a Gaussian likelihood, then the result of Theorem 2 reduces to
• A key result of Theorem 2 is that if the prior on is chosen such that the derivative of the log probability leads to the
derivative of the log predictive probability that
is bounded at 0 at large . This happens for heavy-tailed priors, including the proposed horseshoe prior. This yields
Robustness to large signals
The horseshoe score function
• Theorem 3. Suppose . Let denote the predictive density under the horseshoe prior for known scale parameter , i.e. where and . Then for some that depends upon , and
• Corollary:
• Although the horseshoe prior has no analytic form, it does lead to the following posterior mean
where is a degenerate hypergeometric function of two variables.
Estimating
• The conditional posterior distribution of is approximately
if dimensionality p is large.
• This approximately yields a distribution for where .
• If most observations are shrunk toward 0, then will be small with high probability.
Super-efficient convergence
• Theorem 4. Suppose the true sampling model is . Then:
(1) For under the horseshoe prior, the optimal rate of convergence of when is
where b is a constant. When , the optimal rate is .
(2) Suppose is any other density that is continuous, bounded above, and strictly positive on a neighborhood of the true value . For under , the optimal rate of convergence of , regardless of , is
Example-Vanguard mutual-fund data
• Here, the authors show how the horseshoe can provide a regularized estimate of a large covariance matrix whose inverse may be sparse.
• Vanguard mutual funds dataset containing n = 86 weekly returns for p = 59 funds.
• Suppose the observation matrix is
with each p-dimensional vector is drawn from a zero-mean Gaussian with covariance matrix .
• We will model the Cholesky decomposition of .
Example-Vanguard mutual-fund data
• The goal is to estimate the ensemble of regression models in the implied triangular system , where is the column of Y.
• The regression coefficients are assumed to have a Horseshoe prior, and posterior means were computed using MCMC.
top related