raymond j. carroll texas a&m university and university of technology sydney carroll bayesian...
TRANSCRIPT
Raymond J. Carroll
Texas A&M University and University of Technology Sydney
http://stat.tamu.edu/~carroll
Bayesian Methods for Density and Regression Deconvolution
Co-Authors
Bani Mallick Abhra Sarkar
John StaudenmayerDebdeep Pati
Longtime Collaborators in Deconvolution
Peter Hall Aurore Delaigle
Len Stefanski
Overview
• My main application interest is in nutrition
• Nutritional intake is necessarily multivariate
• Smart nutritionists have recognized that in cancers, it is the patterns of nutrition that matter, not single causes such as saturated fat
• To affect public health practice, nutritionists have developed scores that characterize how well one eats
• Healthy Eating Index, Dash score, Mediterranean score, etc.
Overview
• One day of French fries/Chips will not kill you
• It is your long-term average pattern that is important
• In population public health science, long term averages cannot be measured
• The best you can get is some version of self-report, e.g., multiple 24 hour recalls
• This fact has been the driver behind much of measurement error modeling, especially including density deconvolution
Overview
• Analysis is complicated by the fact that on a given day, people will not consume certain foods, e.g., whole grains, legumes, etc.
• My long term goal has been to develop methods that take into account measurement error, the multivariate nature of nutrition, and excess zeros.
Why it Matters
• What % of kids U.S. have alarmingly bad diets?
• Ignore measurement error, 28%
• Account for it, 8%
• What are the relative rates of colon cancer for those with a HEI score of 70 versus those with 40?
• Ignore measurement error, decrease 10%
• Account for it, decrease 35%
Overview
• We have perfectly serviceable and practical methods that involve transformations, random effects, latent variables and measurement errors
• The methods are widely and internationally used in nutritional surveillance and nutritional epidemiology
• For the multivariate case, computation is “Bayesian”
• Eventually though, anything random is assumed to be Gaussian
• Can we not do better?
Background
• In the classical measurement error – deconvolution problem, there is a variable, X, that is not observable
• Instead, a proxy for it, W, is observed
• In the density problem, the goal is to estimate the density of X using only observations on W
• Also, in population science contexts, the distribution of X given covariates Z is also important (very small literature on this)
Background
• In the regression problem, there is a response Y
• One goal is to estimate E(Y | X)
• Another goal is to estimate the distribution of Y given X, because variances are not always nuisance parameters
Background
• In the classic problem, W = X + U, with U independent on X.
• Deconvoluting kernel methods that result in consistent estimation of the density of X were discovered in 1988 (Stefanski, Hall, Fan and )
• They are kernel density estimates with kernel function
deconK (x)
Background
• In the classic problem, W = X + U, with U independent of X.
• The deconvoluting kernel is a corrected score for a ordinary kernel density function, with the property that for a bandwidth h,
• Lots of results on rates of convergence, etc.
decon 0 0E K (W-x )/h |X =K (X-x )/h
Background
• There is an R package called decon
• However, a paper to appear by A. Delaigle discusses problems with the package’s bandwidth selectors
• Her web site has Matlab code for cases that the measurement error is independent of X, including bandwidth selection
Problem Considered Here
• Here is a general class of models. Here are W and X
• The W’s are independent given X
ij i ij i
ij i i
2ij i i u i
W =X +U (X )
E U (X ) | X 0
var U (X ) | X (X )
Background
• There is a substantial econometric literature on technical conditions for identification in many different contexts (S. Schennach, X. Chen, Y. Hu)
• The problem I have stated is known to be nonparametrically identified if there are 3 replicates (and certain technical completeness assumptions hold)
Problem Considered Here
• Here is a general class of models, First, Y
• The classical heteroscedastic model where the variance is important
• Identified if there are 2 replicate W’s
i i i i
i i i
2i i i ε i
Y =g(X )+ε (X )
E ε (X ) | X 0
var ε (X ) | X (X )
Background
• The econometric literature invariably uses sieves with orthogonal basis functions
• The theory follows X. Shen’s 1997 paper
Background
• In practice, as with non-penalized splines, 5-7 basis functions are used to represent all densities and functions
• Constraints (such as being positive and integrating to 1 for densities) are often ignored
• In the problem I eventually want to solve, the dimension of the two densities = 19 (latent stuff all around
• Maybe use multivariate Hermite series?
Problem Considered Here
• There is no deconvoluting kernel method that does density or regression deconvolution in the context that the distribution of the measurement error depends on X
Problem Considered Here
• It seems to me that there are two ways to handle this problem in general
• Sieves be an econometrician
• Bayesian with flexible models
• Our methodology is explicitly Bayesian, but borrows basis function ideas from the sieve approach
Model Formulation
• We borrow from Hu and Schennach’s example and also Staudenmayer, Ruppert and Buonaccorsi
• Here, U is assumed independent of X
• Also, e is independent of X
1/2ij i u i ij
1/2i i ε i i
W = X + s (X )U
Y = g(X )+ s (X )ε
Model Formulation
• Our model is
• Like previous authors, we model as B-splines with positive coefficients
• We model as B-spline
• As frequentists, we could model the densities of X, U, and e by sieves, and appeal to Hu and Schennach for theory
• We have not investigated this
ε i u is (X ) and s (X )
1/2ij i u i ij
1/2i i ε i i
W = X + s (X )U
Y = g(X )+ s (X )ε
i g(X )
Model Formulation
• Our model is
• As Bayesians, we have modeled the densities of X, U, and e by DPMM
• We have found that mixtures of normals, with an unknown number of components, is much faster, just as effective, and very stable numerically
1/2ij i u i ij
1/2i i ε i i
W = X + s (X )U
Y = g(X )+ s (X )ε
Model Formulation
• We found that by fixing the number of components to a largish number works best
• The method concentrates on a lower number of components (Rousseau and Mengersen found this in a non-measurement error context)
• There are lots of issues involved: (a) starting values; (b) hyper-parameters; (c) MH candidates; (d) constraints (e.g., zero means), (e) data standardization, etc.
Model Formulation
• Here is a simulation example of density deconvolution and homoscedasticity with a mixture of normals for X and a Laplace for U
• The settings come from a paper not by us
• There are 3 replicates, so the density of U is also estimated by our method (we let DKDE know the truth)
• I ran our R code as is, with no fine tuning
Model Formulation
Model Formulation
• Here is another example
• Y = sodium intake as measured by a food frequency questionnaire (known to be biased)
• W = same thing, but measured by a 24 hour recall (known to be almost unbiased)
• We have R code for this
Model Formulation
The dashed line is the Y=X line, indicating the bias of the FFQ
Multivariate Deconvolution
• There are also multivariate problems of density deconvolution
• We have found 4 papers about this
• 3 deconvoluting kernel papers, all assume the density of the measurement errors is known
• 1 of those papers has a bandwidth selector
• Bovy et al (2011, AoAS) model X as a mixture of normals, and assume U is independent of X and Gaussian with known covariance matrix. They use an EM algorithm.
Multivariate Deconvolution
• We have generalized our 1-dimension deconvolution approach as
• Again, X is a mixture of multivariate normals, as is U
• However, standard multivariate inverse Wishart computations fail miserably
1/2ijk ij uj ij ijkW = X + s (X )U
Multivariate Deconvolution
• We have generalized our 1-dimension deconvolution approach as
• We use a factor analytic representation of the component specific covariance matrices with sparsity inducing shrinkage priors on the factor loading matrices (A. Bhattacharya and D. Dunson)
• This is crucial in flexibly lowering the dimension of the covariance matrices
1/2ijk ij uj ij ijkW = X + s (X )U
Multivariate Deconvolution
Multivariate inverse Wisharts on top, Latent factor model on bottom
Blue = MIW, green = MLFA.
Variables are (a) carbs; (b) fiber; (c) protein and (d) potassium
Conclusion
• I still want to get to my problem of multiple nutrients/foods, excess zeros and measurement error
• Dimension reduction and flexible models seem a practical way to go
• Final point: for health risk estimation and nutritional surveillance, only a 1-dimensional summary is needed, hence better rates of convergence