bayesian generalized linear models and an …gelman/presentations/usertalk.pdflogistic regression...
TRANSCRIPT
Logistic regressionWeakly informative priors
Conclusions
Bayesian generalized linear models and anappropriate default prior
Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, andYu-Sung Su
Columbia University
14 August 2008
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
Logistic regression
−6 −4 −2 0 2 4 60.0
0.2
0.4
0.6
0.8
1.0 y = logit−1(x)
x
logi
t−1(x
)
slope = 1/4
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
A clean example
−10 0 10 20
0.0
0.2
0.4
0.6
0.8
1.0
estimated Pr(y=1) = logit−1(−1.40 + 0.33 x)
x
y slope = 0.33/4
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
The problem of separation
−6 −4 −2 0 2 4 6
0.0
0.2
0.4
0.6
0.8
1.0
slope = infinity?
x
y
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
Separation is no joke!
glm (vote ~ female + black + income, family=binomial(link="logit"))
1960 1968
coef.est coef.se coef.est coef.se
(Intercept) -0.14 0.23 (Intercept) 0.47 0.24
female 0.24 0.14 female -0.01 0.15
black -1.03 0.36 black -3.64 0.59
income 0.03 0.06 income -0.03 0.07
1964 1972
coef.est coef.se coef.est coef.se
(Intercept) -1.15 0.22 (Intercept) 0.67 0.18
female -0.09 0.14 female -0.25 0.12
black -16.83 420.40 black -2.63 0.27
income 0.19 0.06 income 0.09 0.05
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
bayesglm()
I Bayesian logistic regression
I In the arm (Applied Regression and Multilevel modeling)package
I Replaces glm(), estimates are more numerically andcomputationally stable
I Student-t prior distributions for regression coefs
I Use EM-like algorithm
I We went inside glm.fit to augment the iteratively weightedleast squares step
I Default choices for tuning parameters (we’ll get back to this!)
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
Regularization in action!
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Classical logistic regressionThe problem of separationBayesian solution
What else is out there?
I glm (maximum likelihood): fails under separation, gives noisyanswers for sparse data
I Augment with prior “successes” and “failures”: doesn’t workwell for multiple predictors
I brlr (Jeffreys-like prior distribution): computationallyunstable
I brglm (improvement on brlr): doesn’t do enough smoothing
I BBR (Laplace prior distribution): OK, not quite as good asbayesglm
I Non-Bayesian machine learning algorithms: understateuncertainty in predictions
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Information in prior distributions
I Informative prior distI A full generative model for the data
I Noninformative prior distI Let the data speakI Goal: valid inference for any θ
I Weakly informative prior distI Purposely include less information than we actually haveI Goal: regularization, stabilization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Weakly informative priors forlogistic regression coefficients
I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always
between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50
or from 0.50 to 0.99I Smoking and lung cancer
I Independent Cauchy prior dists with center 0 and scale 2.5
I Rescale each predictor to have mean 0 and sd 12
I Fast implementation using EM; easy adaptation of glm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Weakly informative priors forlogistic regression coefficients
I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always
between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50
or from 0.50 to 0.99I Smoking and lung cancer
I Independent Cauchy prior dists with center 0 and scale 2.5
I Rescale each predictor to have mean 0 and sd 12
I Fast implementation using EM; easy adaptation of glm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Weakly informative priors forlogistic regression coefficients
I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always
between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50
or from 0.50 to 0.99I Smoking and lung cancer
I Independent Cauchy prior dists with center 0 and scale 2.5
I Rescale each predictor to have mean 0 and sd 12
I Fast implementation using EM; easy adaptation of glm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Weakly informative priors forlogistic regression coefficients
I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always
between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50
or from 0.50 to 0.99I Smoking and lung cancer
I Independent Cauchy prior dists with center 0 and scale 2.5
I Rescale each predictor to have mean 0 and sd 12
I Fast implementation using EM; easy adaptation of glm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Weakly informative priors forlogistic regression coefficients
I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always
between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50
or from 0.50 to 0.99I Smoking and lung cancer
I Independent Cauchy prior dists with center 0 and scale 2.5
I Rescale each predictor to have mean 0 and sd 12
I Fast implementation using EM; easy adaptation of glm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Weakly informative priors forlogistic regression coefficients
I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always
between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50
or from 0.50 to 0.99I Smoking and lung cancer
I Independent Cauchy prior dists with center 0 and scale 2.5
I Rescale each predictor to have mean 0 and sd 12
I Fast implementation using EM; easy adaptation of glm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Weakly informative priors forlogistic regression coefficients
I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always
between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50
or from 0.50 to 0.99I Smoking and lung cancer
I Independent Cauchy prior dists with center 0 and scale 2.5
I Rescale each predictor to have mean 0 and sd 12
I Fast implementation using EM; easy adaptation of glm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Weakly informative priors forlogistic regression coefficients
I Separation in logistic regressionI Some prior info: logistic regression coefs are almost always
between −5 and 5:I 5 on the logit scale takes you from 0.01 to 0.50
or from 0.50 to 0.99I Smoking and lung cancer
I Independent Cauchy prior dists with center 0 and scale 2.5
I Rescale each predictor to have mean 0 and sd 12
I Fast implementation using EM; easy adaptation of glm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Prior distributions
−10 −5 0 5 10
θ
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Another example
Dose #deaths/#animals
−0.86 0/5−0.30 1/5−0.05 3/5
0.73 5/5
I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9
I Which is truly conservative?
I The sociology of shrinkage
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Another example
Dose #deaths/#animals
−0.86 0/5−0.30 1/5−0.05 3/5
0.73 5/5
I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9
I Which is truly conservative?
I The sociology of shrinkage
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Another example
Dose #deaths/#animals
−0.86 0/5−0.30 1/5−0.05 3/5
0.73 5/5
I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9
I Which is truly conservative?
I The sociology of shrinkage
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Another example
Dose #deaths/#animals
−0.86 0/5−0.30 1/5−0.05 3/5
0.73 5/5
I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9
I Which is truly conservative?
I The sociology of shrinkage
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Another example
Dose #deaths/#animals
−0.86 0/5−0.30 1/5−0.05 3/5
0.73 5/5
I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9
I Which is truly conservative?
I The sociology of shrinkage
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Another example
Dose #deaths/#animals
−0.86 0/5−0.30 1/5−0.05 3/5
0.73 5/5
I Slope of a logistic regression of Pr(death) on dose:I Maximum likelihood est is 7.8± 4.9I With weakly-informative prior: Bayes est is 4.4± 1.9
I Which is truly conservative?
I The sociology of shrinkage
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Maximum likelihood and Bayesian estimates
Dose
Pro
babi
lity
of d
eath
0 10 20
0.0
0.5
1.0
glmbayesglm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Conservatism of Bayesian inference
I Problems with maximum likelihood when data showseparation:
I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases
I Is this conservative?
I Not if evaluated by log score or predictive log-likelihood
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Conservatism of Bayesian inference
I Problems with maximum likelihood when data showseparation:
I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases
I Is this conservative?
I Not if evaluated by log score or predictive log-likelihood
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Conservatism of Bayesian inference
I Problems with maximum likelihood when data showseparation:
I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases
I Is this conservative?
I Not if evaluated by log score or predictive log-likelihood
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Conservatism of Bayesian inference
I Problems with maximum likelihood when data showseparation:
I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases
I Is this conservative?
I Not if evaluated by log score or predictive log-likelihood
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Conservatism of Bayesian inference
I Problems with maximum likelihood when data showseparation:
I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases
I Is this conservative?
I Not if evaluated by log score or predictive log-likelihood
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Conservatism of Bayesian inference
I Problems with maximum likelihood when data showseparation:
I Coefficient estimate of −∞I Estimated predictive probability of 0 for new cases
I Is this conservative?
I Not if evaluated by log score or predictive log-likelihood
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Which one is conservative?
Dose
Pro
babi
lity
of d
eath
0 10 20
0.0
0.5
1.0
glmbayesglm
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Prior as population distribution
I Consider many possible datasets
I The “true prior” is the distribution of β’s across these datasets
I Fit one dataset at a time
I A “weakly informative prior” has less information (widervariance) than the true prior
I Open question: How to formalize the tradeoffs from usingdifferent priors?
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Prior as population distribution
I Consider many possible datasets
I The “true prior” is the distribution of β’s across these datasets
I Fit one dataset at a time
I A “weakly informative prior” has less information (widervariance) than the true prior
I Open question: How to formalize the tradeoffs from usingdifferent priors?
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Prior as population distribution
I Consider many possible datasets
I The “true prior” is the distribution of β’s across these datasets
I Fit one dataset at a time
I A “weakly informative prior” has less information (widervariance) than the true prior
I Open question: How to formalize the tradeoffs from usingdifferent priors?
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Prior as population distribution
I Consider many possible datasets
I The “true prior” is the distribution of β’s across these datasets
I Fit one dataset at a time
I A “weakly informative prior” has less information (widervariance) than the true prior
I Open question: How to formalize the tradeoffs from usingdifferent priors?
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Prior as population distribution
I Consider many possible datasets
I The “true prior” is the distribution of β’s across these datasets
I Fit one dataset at a time
I A “weakly informative prior” has less information (widervariance) than the true prior
I Open question: How to formalize the tradeoffs from usingdifferent priors?
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Prior as population distribution
I Consider many possible datasets
I The “true prior” is the distribution of β’s across these datasets
I Fit one dataset at a time
I A “weakly informative prior” has less information (widervariance) than the true prior
I Open question: How to formalize the tradeoffs from usingdifferent priors?
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Evaluation using a corpus of datasets
I Compare classical glm to Bayesian estimates using variousprior distributions
I Evaluate using 5-fold cross-validation and average predictiveerror
I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)
I Our Cauchy (0, 2.5) prior distribution is weakly informative!
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Evaluation using a corpus of datasets
I Compare classical glm to Bayesian estimates using variousprior distributions
I Evaluate using 5-fold cross-validation and average predictiveerror
I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)
I Our Cauchy (0, 2.5) prior distribution is weakly informative!
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Evaluation using a corpus of datasets
I Compare classical glm to Bayesian estimates using variousprior distributions
I Evaluate using 5-fold cross-validation and average predictiveerror
I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)
I Our Cauchy (0, 2.5) prior distribution is weakly informative!
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Evaluation using a corpus of datasets
I Compare classical glm to Bayesian estimates using variousprior distributions
I Evaluate using 5-fold cross-validation and average predictiveerror
I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)
I Our Cauchy (0, 2.5) prior distribution is weakly informative!
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Evaluation using a corpus of datasets
I Compare classical glm to Bayesian estimates using variousprior distributions
I Evaluate using 5-fold cross-validation and average predictiveerror
I The optimal prior distribution for β’s is (approx) Cauchy (0, 1)
I Our Cauchy (0, 2.5) prior distribution is weakly informative!
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Expected predictive loss, avg over a corpus of datasets
0 1 2 3 4 5
0.29
0.30
0.31
0.32
0.33
scale of prior
−lo
g te
st li
kelih
ood
(1.79)GLM
BBR(l)
df=2.0
df=4.0
df=8.0BBR(g)
df=1.0
df=0.5
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Priors for other regression models
I Probit
I Ordered logit/probit
I Poisson
I Linear regression with normal errors
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Priors for other regression models
I Probit
I Ordered logit/probit
I Poisson
I Linear regression with normal errors
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Priors for other regression models
I Probit
I Ordered logit/probit
I Poisson
I Linear regression with normal errors
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Priors for other regression models
I Probit
I Ordered logit/probit
I Poisson
I Linear regression with normal errors
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Priors for other regression models
I Probit
I Ordered logit/probit
I Poisson
I Linear regression with normal errors
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
Prior informationWho’s the real conservative?Evaluation using a corpus of datasetsOther generalized linear models
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Conclusions
I “Noninformative priors” are actually weakly informative
I “Weakly informative” is a more general and useful conceptI Regularization
I Better inferencesI Stability of computation (bayesglm)
I Why use weakly informative priors rather than informativepriors?
I Conformity with statistical culture (“conservatism”)I Labor-saving deviceI Robustness
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Other examples of weakly informative priors
I Variance parameters
I Covariance matrices
I Population variation in a physiological model
I Mixture models
I Intentional underpooling in hierarchical models
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forvariance parameter
I Basic hierarchical model
I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!
I Noninformative uniform prior works better
I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forvariance parameter
I Basic hierarchical model
I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!
I Noninformative uniform prior works better
I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forvariance parameter
I Basic hierarchical model
I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!
I Noninformative uniform prior works better
I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forvariance parameter
I Basic hierarchical model
I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!
I Noninformative uniform prior works better
I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forvariance parameter
I Basic hierarchical model
I Traditional inverse-gamma(0.001, 0.001) prior can be highlyinformative (in a bad way)!
I Noninformative uniform prior works better
I But if #groups is small (J = 2, 3, even 5), a weaklyinformative prior helps by shutting down huge values of τ
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Priors for variance parameter: J = 8 groups
σα0 5 10 15 20 25 30
8 schools: posterior on σα givenuniform prior on σα
σα0 5 10 15 20 25 30
8 schools: posterior on σα giveninv−gamma (1, 1) prior on σα
2
σα0 5 10 15 20 25 30
8 schools: posterior on σα giveninv−gamma (.001, .001) prior on σα
2
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Priors for variance parameter: J = 3 groups
σα0 50 100 150 200
3 schools: posterior on σα givenuniform prior on σα
σα0 50 100 150 200
3 schools: posterior on σα givenhalf−Cauchy (25) prior on σα
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forcovariance matrices
I Inverse-Wishart has problems
I Correlations can be between 0 and 1
I Set up models so prior expectation of correlations is 0
I Goal: to be weakly informative about correlations andvariances
I Scaled inverse-Wishart model uses redundant parameterization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forcovariance matrices
I Inverse-Wishart has problems
I Correlations can be between 0 and 1
I Set up models so prior expectation of correlations is 0
I Goal: to be weakly informative about correlations andvariances
I Scaled inverse-Wishart model uses redundant parameterization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forcovariance matrices
I Inverse-Wishart has problems
I Correlations can be between 0 and 1
I Set up models so prior expectation of correlations is 0
I Goal: to be weakly informative about correlations andvariances
I Scaled inverse-Wishart model uses redundant parameterization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forcovariance matrices
I Inverse-Wishart has problems
I Correlations can be between 0 and 1
I Set up models so prior expectation of correlations is 0
I Goal: to be weakly informative about correlations andvariances
I Scaled inverse-Wishart model uses redundant parameterization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forcovariance matrices
I Inverse-Wishart has problems
I Correlations can be between 0 and 1
I Set up models so prior expectation of correlations is 0
I Goal: to be weakly informative about correlations andvariances
I Scaled inverse-Wishart model uses redundant parameterization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forcovariance matrices
I Inverse-Wishart has problems
I Correlations can be between 0 and 1
I Set up models so prior expectation of correlations is 0
I Goal: to be weakly informative about correlations andvariances
I Scaled inverse-Wishart model uses redundant parameterization
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forpopulation variation in a physiological model
I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”
I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)
I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????
I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forpopulation variation in a physiological model
I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”
I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)
I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????
I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forpopulation variation in a physiological model
I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”
I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)
I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????
I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forpopulation variation in a physiological model
I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”
I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)
I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????
I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forpopulation variation in a physiological model
I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”
I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)
I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????
I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forpopulation variation in a physiological model
I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”
I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)
I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????
I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forpopulation variation in a physiological model
I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”
I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)
I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????
I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors forpopulation variation in a physiological model
I Pharamcokinetic parameters such as the “Michaelis-Mentencoefficient”
I Wide uncertainty: prior guess for θ is 15 with a factor of 100of uncertainty, log θ ∼ N(log(15), log(10)2)
I Population model: data on several people j ,log θj ∼ N(log(15), log(10)2) ????
I Hierarchical prior distribution:I log θj ∼ N(µ, σ2), σ ≈ log(2)I µ ∼ N(log(15), log(10)2)
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors formixture models
I Well-known problem of fitting the mixture model likelihood
I The maximum likelihood fits are weird, with a single pointtaking half the mixture
I Bayes with flat prior is just as bad
I These solutions don’t “look” like mixtures
I There must be additional prior information—or, to put itanother way, regularization
I Simple constraints, for example, a prior dist on the varianceratio
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors formixture models
I Well-known problem of fitting the mixture model likelihood
I The maximum likelihood fits are weird, with a single pointtaking half the mixture
I Bayes with flat prior is just as bad
I These solutions don’t “look” like mixtures
I There must be additional prior information—or, to put itanother way, regularization
I Simple constraints, for example, a prior dist on the varianceratio
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors formixture models
I Well-known problem of fitting the mixture model likelihood
I The maximum likelihood fits are weird, with a single pointtaking half the mixture
I Bayes with flat prior is just as bad
I These solutions don’t “look” like mixtures
I There must be additional prior information—or, to put itanother way, regularization
I Simple constraints, for example, a prior dist on the varianceratio
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors formixture models
I Well-known problem of fitting the mixture model likelihood
I The maximum likelihood fits are weird, with a single pointtaking half the mixture
I Bayes with flat prior is just as bad
I These solutions don’t “look” like mixtures
I There must be additional prior information—or, to put itanother way, regularization
I Simple constraints, for example, a prior dist on the varianceratio
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors formixture models
I Well-known problem of fitting the mixture model likelihood
I The maximum likelihood fits are weird, with a single pointtaking half the mixture
I Bayes with flat prior is just as bad
I These solutions don’t “look” like mixtures
I There must be additional prior information—or, to put itanother way, regularization
I Simple constraints, for example, a prior dist on the varianceratio
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors formixture models
I Well-known problem of fitting the mixture model likelihood
I The maximum likelihood fits are weird, with a single pointtaking half the mixture
I Bayes with flat prior is just as bad
I These solutions don’t “look” like mixtures
I There must be additional prior information—or, to put itanother way, regularization
I Simple constraints, for example, a prior dist on the varianceratio
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors formixture models
I Well-known problem of fitting the mixture model likelihood
I The maximum likelihood fits are weird, with a single pointtaking half the mixture
I Bayes with flat prior is just as bad
I These solutions don’t “look” like mixtures
I There must be additional prior information—or, to put itanother way, regularization
I Simple constraints, for example, a prior dist on the varianceratio
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Weakly informative priors formixture models
I Well-known problem of fitting the mixture model likelihood
I The maximum likelihood fits are weird, with a single pointtaking half the mixture
I Bayes with flat prior is just as bad
I These solutions don’t “look” like mixtures
I There must be additional prior information—or, to put itanother way, regularization
I Simple constraints, for example, a prior dist on the varianceratio
I Weakly informative
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Intentional underpooling in hierarchical models
I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj
I Bayesian partial-pooling estimate E(θj |y)
I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ
I An example of the “incompatible Gibbs” algorithm
I Why would we do this??
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Intentional underpooling in hierarchical models
I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj
I Bayesian partial-pooling estimate E(θj |y)
I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ
I An example of the “incompatible Gibbs” algorithm
I Why would we do this??
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Intentional underpooling in hierarchical models
I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj
I Bayesian partial-pooling estimate E(θj |y)
I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ
I An example of the “incompatible Gibbs” algorithm
I Why would we do this??
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Intentional underpooling in hierarchical models
I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj
I Bayesian partial-pooling estimate E(θj |y)
I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ
I An example of the “incompatible Gibbs” algorithm
I Why would we do this??
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Intentional underpooling in hierarchical models
I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj
I Bayesian partial-pooling estimate E(θj |y)
I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ
I An example of the “incompatible Gibbs” algorithm
I Why would we do this??
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Intentional underpooling in hierarchical models
I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj
I Bayesian partial-pooling estimate E(θj |y)
I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ
I An example of the “incompatible Gibbs” algorithm
I Why would we do this??
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Intentional underpooling in hierarchical models
I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj
I Bayesian partial-pooling estimate E(θj |y)
I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ
I An example of the “incompatible Gibbs” algorithm
I Why would we do this??
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Intentional underpooling in hierarchical models
I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj
I Bayesian partial-pooling estimate E(θj |y)
I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ
I An example of the “incompatible Gibbs” algorithm
I Why would we do this??
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior
Logistic regressionWeakly informative priors
Conclusions
ConclusionsExtra stuff
Intentional underpooling in hierarchical models
I Basic hierarchical model:I Data yj on parameters θjI Group-level model θj ∼ N(µ, τ 2)I No-pooling estimate θ̂j = yj
I Bayesian partial-pooling estimate E(θj |y)
I Weak Bayes estimate: same as Bayes, but replacing τ with 2τ
I An example of the “incompatible Gibbs” algorithm
I Why would we do this??
Gelman, Jakulin, Pittau, Su Bayesian generalized linear models and an appropriate default prior