raymond j. carroll texas a&m university and university of technology sydney carroll bayesian...

Raymond J. Carroll

Texas A&M University and University of Technology Sydney

http://stat.tamu.edu/~carroll

Bayesian Methods for Density and Regression Deconvolution

Co-Authors

Bani Mallick Abhra Sarkar

John StaudenmayerDebdeep Pati

Longtime Collaborators in Deconvolution

Peter Hall Aurore Delaigle

Len Stefanski

Overview

• My main application interest is in nutrition

• Nutritional intake is necessarily multivariate

• Smart nutritionists have recognized that in cancers, it is the patterns of nutrition that matter, not single causes such as saturated fat

• To affect public health practice, nutritionists have developed scores that characterize how well one eats

• Healthy Eating Index, Dash score, Mediterranean score, etc.

Overview

• One day of French fries/Chips will not kill you

• It is your long-term average pattern that is important

• In population public health science, long term averages cannot be measured

• The best you can get is some version of self-report, e.g., multiple 24 hour recalls

• This fact has been the driver behind much of measurement error modeling, especially including density deconvolution

Overview

• Analysis is complicated by the fact that on a given day, people will not consume certain foods, e.g., whole grains, legumes, etc.

• My long term goal has been to develop methods that take into account measurement error, the multivariate nature of nutrition, and excess zeros.

Why it Matters

• What % of kids U.S. have alarmingly bad diets?

• Ignore measurement error, 28%

• Account for it, 8%

• What are the relative rates of colon cancer for those with a HEI score of 70 versus those with 40?

• Ignore measurement error, decrease 10%

• Account for it, decrease 35%

Overview

• We have perfectly serviceable and practical methods that involve transformations, random effects, latent variables and measurement errors

• The methods are widely and internationally used in nutritional surveillance and nutritional epidemiology

• For the multivariate case, computation is “Bayesian”

• Eventually though, anything random is assumed to be Gaussian

• Can we not do better?

Background

• In the classical measurement error – deconvolution problem, there is a variable, X, that is not observable

• Instead, a proxy for it, W, is observed

• In the density problem, the goal is to estimate the density of X using only observations on W

• Also, in population science contexts, the distribution of X given covariates Z is also important (very small literature on this)

Background

• In the regression problem, there is a response Y

• One goal is to estimate E(Y | X)

• Another goal is to estimate the distribution of Y given X, because variances are not always nuisance parameters

Background

• In the classic problem, W = X + U, with U independent on X.

• Deconvoluting kernel methods that result in consistent estimation of the density of X were discovered in 1988 (Stefanski, Hall, Fan and )

• They are kernel density estimates with kernel function

deconK (x)

Background

• In the classic problem, W = X + U, with U independent of X.

• The deconvoluting kernel is a corrected score for a ordinary kernel density function, with the property that for a bandwidth h,

• Lots of results on rates of convergence, etc.

decon 0 0E K (W-x )/h |X =K (X-x )/h

Background

• There is an R package called decon

• However, a paper to appear by A. Delaigle discusses problems with the package’s bandwidth selectors

• Her web site has Matlab code for cases that the measurement error is independent of X, including bandwidth selection

Problem Considered Here

• Here is a general class of models. Here are W and X

• The W’s are independent given X

ij i ij i

ij i i

2ij i i u i

W =X +U (X )

E U (X ) | X 0

var U (X ) | X (X )

Background

• There is a substantial econometric literature on technical conditions for identification in many different contexts (S. Schennach, X. Chen, Y. Hu)

• The problem I have stated is known to be nonparametrically identified if there are 3 replicates (and certain technical completeness assumptions hold)


• Here is a general class of models, First, Y

• The classical heteroscedastic model where the variance is important

• Identified if there are 2 replicate W’s

i i i i

i i i

2i i i ε i

Y =g(X )+ε (X )

E ε (X ) | X 0

var ε (X ) | X (X )

Background

• The econometric literature invariably uses sieves with orthogonal basis functions

• The theory follows X. Shen’s 1997 paper

Background

• In practice, as with non-penalized splines, 5-7 basis functions are used to represent all densities and functions

• Constraints (such as being positive and integrating to 1 for densities) are often ignored

• In the problem I eventually want to solve, the dimension of the two densities = 19 (latent stuff all around

• Maybe use multivariate Hermite series?


• There is no deconvoluting kernel method that does density or regression deconvolution in the context that the distribution of the measurement error depends on X


• It seems to me that there are two ways to handle this problem in general

• Sieves be an econometrician

• Bayesian with flexible models

• Our methodology is explicitly Bayesian, but borrows basis function ideas from the sieve approach

Model Formulation

• We borrow from Hu and Schennach’s example and also Staudenmayer, Ruppert and Buonaccorsi

• Here, U is assumed independent of X

• Also, e is independent of X

1/2ij i u i ij

1/2i i ε i i

W = X + s (X )U

Y = g(X )+ s (X )ε

Model Formulation

• Our model is

• Like previous authors, we model as B-splines with positive coefficients

• We model as B-spline

• As frequentists, we could model the densities of X, U, and e by sieves, and appeal to Hu and Schennach for theory

• We have not investigated this

ε i u is (X ) and s (X )

1/2ij i u i ij

1/2i i ε i i

W = X + s (X )U

Y = g(X )+ s (X )ε

i g(X )

Model Formulation

• Our model is

• As Bayesians, we have modeled the densities of X, U, and e by DPMM

• We have found that mixtures of normals, with an unknown number of components, is much faster, just as effective, and very stable numerically

1/2ij i u i ij

1/2i i ε i i

W = X + s (X )U

Y = g(X )+ s (X )ε

Model Formulation

• We found that by fixing the number of components to a largish number works best

• The method concentrates on a lower number of components (Rousseau and Mengersen found this in a non-measurement error context)

• There are lots of issues involved: (a) starting values; (b) hyper-parameters; (c) MH candidates; (d) constraints (e.g., zero means), (e) data standardization, etc.

Model Formulation

• Here is a simulation example of density deconvolution and homoscedasticity with a mixture of normals for X and a Laplace for U

• The settings come from a paper not by us

• There are 3 replicates, so the density of U is also estimated by our method (we let DKDE know the truth)

• I ran our R code as is, with no fine tuning

Model Formulation

Model Formulation

• Here is another example

• Y = sodium intake as measured by a food frequency questionnaire (known to be biased)

• W = same thing, but measured by a 24 hour recall (known to be almost unbiased)

• We have R code for this

Model Formulation

The dashed line is the Y=X line, indicating the bias of the FFQ

Multivariate Deconvolution

• There are also multivariate problems of density deconvolution

• We have found 4 papers about this

• 3 deconvoluting kernel papers, all assume the density of the measurement errors is known

• 1 of those papers has a bandwidth selector

• Bovy et al (2011, AoAS) model X as a mixture of normals, and assume U is independent of X and Gaussian with known covariance matrix. They use an EM algorithm.


• We have generalized our 1-dimension deconvolution approach as

• Again, X is a mixture of multivariate normals, as is U

• However, standard multivariate inverse Wishart computations fail miserably

1/2ijk ij uj ij ijkW = X + s (X )U


• We have generalized our 1-dimension deconvolution approach as

• We use a factor analytic representation of the component specific covariance matrices with sparsity inducing shrinkage priors on the factor loading matrices (A. Bhattacharya and D. Dunson)

• This is crucial in flexibly lowering the dimension of the covariance matrices

1/2ijk ij uj ij ijkW = X + s (X )U


Multivariate inverse Wisharts on top, Latent factor model on bottom

Blue = MIW, green = MLFA.

Variables are (a) carbs; (b) fiber; (c) protein and (d) potassium

Conclusion

• I still want to get to my problem of multiple nutrients/foods, excess zeros and measurement error

• Dimension reduction and flexible models seem a practical way to go

• Final point: for health risk estimation and nutritional surveillance, only a 1-dimensional summary is needed, hence better rates of convergence

raymond j. carroll texas a&m university and university of technology sydney carroll bayesian...

Documents

paper slide

density deconvolution

independent given x

density of x

kernel function slide

density problem

regression deconvolution

bandwidth selection