(from old-fashioned least-squares to markov chain …...(r + jags) (python + stan) hamiltonian monte...
Post on 07-Jul-2020
2 Views
Preview:
TRANSCRIPT
Markov-Chain Monte Carlo methods Linear regression
Wanna fit a straight line to your data?
(from old-fashioned least-squares to Markov Chain Monte-Carlo methods)
Javier Gorgas
Multidisciplinary seminar. Dpto Física de la Tierra y Astrofísica, 22/11/18
Ordinary Least Squares Problems with error bars MCMC Comparison of methods
Markov-Chain Monte Carlo methods Linear regression“Old fashioned” least squares
Ordinary linear regression using least-squares (OLS). Equivalent to maximum likelihood methods
Do not forget the hypotheses: Independence among the y values Linearity: there is an actual linear relation between x and y The dispersion is only due to errors in the dependent variable y (x values don’t have errors) Gaussianity in the y errors. For a fixed x, errors in y follow a normal distribution: N(0,σ) Homoscedasticity: constant errors (fixed σ). Plus: to make inference about the correlation, the x values must follow a Gaussian distribution
Y|X
X|Y
Remember: if you make a Y|X OLS you are assuming that X is an independent variable (error- free) and Y is a dependent variable. For data with two observed variables, each with its own uncertainties (a symmetric case) Y|X OLS underestimates the slope of the actual relation!
OLS
Markov-Chain Monte Carlo methods Linear regressionThe scientific literature is plagued with this problem.
An own example: Sánchez-Blázquez et al. (2006) (presentation of the MILES library)
OLS
Markov-Chain Monte Carlo methods Linear regressionHow to solve this?: Easy: symmetrical linear regression
See “Linear Regression in Astronomy” Isobe et al., (1990, ApJ, 364, 104) and Feigelson & Babu (1992, ApJ, 397, 55):
OLS(Y|X) OLS(X|Y)
OR
RMA
Asymmetrical treatment of X and Y: OLS(Y|X) OLS(X|Y)
Symmetrical treatment of X and Y: Bisector OLS: fit that bisects the OLS(Y|X) and OLS(X|Y) Orthogonal regresión(OR): minimizes the perpendicular distances to the line Reduced major axis (RMA): minimizes the sum of distances in X and Y Mean OLS: arithmetic mean of OLS(Y|X) and OLS(X|Y)
2.2 2.3 2.4 2.5 2.6
9.5
10.0
10.5
11.0
11.5
log(σ)(km s)
log(
LL s
ol)
OLS (Y|X)
OLS (X|Y)
OLS bise.
Reg. Ort.
E.M.R.
Media OLS
Different methods provide different solutions. You have to choose the most appropriate. Among the four symmetrical methods the Bisector OLS is usually the best one (the other ones depend on the particular units)
OLS
Markov-Chain Monte Carlo methods Linear regressionError bars in the Y values: weighted least squares
M =nX
i=1
d
2i
�
2i
=nX
i=1
1�
2i
(y⇤i � yi)2 =nX
i=1
wi(a + bxi � yi)2 wi = 1/�2iMinimize:
However: the classic formulae (see ex. Bevington, 2002, Data Reduction and Error Analysis for the Physical Sciences) are only valid when the error bars fully explain the observed dispersion.
�2tip =
P1/�2
iP1/�4
i
⇡ S2r S2
r =n0
n0 � 2
P(yi � y⇤
i )2/�2iP
1/�2i
n0 =�P
1/�2i
�2
P1/�4
i
nº effective points
Note that apart form the measurement errors, given by the individual error bars, the usual situation in the experimental sciences is to have actual, real, dispersion.
Alternative, and only approximate, formulae to derive the errors in the derived coefficients in this case:
�2b = S2
r
P1/�4
iP1/�2
i
1�
X 1�2
i
= S2r
P1/�4
i
��
2a = S
2r
P1/�
4iP
1/�
2i
1�
Xx
2i
�
2i
Problems with error bars
Markov-Chain Monte Carlo methods Linear regression
If you have individual error bars plus actual dispersion you are in trouble:
Example: Velocity dispersions and ages (in Gyrs) for a sample of 53 elliptical galaxies (one of the first evidences of downsizing) Sánchez-Blázquez, et al. 2006, A&A, 457,809 (2006)
weighted
non-weighted
Typical error = 0.037 Residual error = 0.131
A possible approach can be doing simulations: Bootstrap + Monte Carlo.
It helps to provide a first-order of magnitud for the solutions and to understand the distribution of the parameters but does not provide fair estimates of
the parameters and their uncertainties
classic weightedcorrected weighted
slope
Problems with error bars
Markov-Chain Monte Carlo methods Linear regressionError bars in the X and Y variables
Homoscedasticity (cte errors) without actual (additional) dispersionSee Feigelson & Babu (1992, ApJ, 397,55) and Fuller (1987, “Measurement Error Models”)
Typical case in astronomy (Ex. relation between two magnitudes, or colors, with the same measurement errors)
Approximate solutions available
b =S
yy
� hSxx
+q
(Syy
� hSxx
)2 + 4hS2xy
2Sxy
h =�2
y
�2x
=�2
⌘
�2µ
h = 0 : OLS(X|Y) h = 1 : OR h = ∞ : OLS(Y|X)
Heteroscedasticity without actual (additional) dispersion
Some rough methods (not very good)
York (1966, Can. J. Phys., 44, 1079)
Numerical Recipes (Press et al. 2007, 3rd ed.)
Heteroscedasticity with actual (additional) dispersion
BCES estimator(bivariate correlated errors and intrinsic scatter), Akritas & Bershady (1996, ApJ, 470, 706) ? SIMEX algorithm (simulation-extrapolation), Carroll, 2006, Measurement error in nonlinear models)
Some proposed unsatisfactory solutions This is the most usual situation
Problems with error bars
Markov-Chain Monte Carlo methods Linear regression
AIM: To compute, by numerical methods, the posterior probability of a parameter or a model in the context of Bayesian statistics.
P (✓|D) =P (D|✓)P (✓)R
⌦✓0P (D|✓0)P (✓0)d✓0
SOFTWARE
Simultaneous determination of thousands of parameters. Easy inclusion of prior information Easy implementation of many different probability distributions Discrimination among models using information criteria (ex. WAIC) Very powerful hierarchical models Etc, etc, etc, etc, etc, etc,…………
JAGS (Just Another Gibbs Sampler)
Stan (Sampling through adaptative neighborhoods)
Also: emcee
(R + JAGS)
(Python + Stan)Hamiltonian Monte Carlo sampler
The Markov Chain Monte-Carlo approach
posterior
priorlikelihoodThe objetive is to sample from the posterior using different sampling algorithms which navigate through the parameter space (ex. Metropolis algorithm). It delivers chains which reproduce the probability distribution function (pdf) of the parameter. It’s not a minimization procedure.
MCMC
Linear regression
+ model{+ for ( i in 1:Ntotal ) { + zy[i] ~ dnorm(mu[i], 1/zsigma^2)+ mu[i] <- zbeta0 + zbeta1*zx[i] + }+ zbeta0 ~ dnorm(0,1/10^2)+ zbeta1 ~ dnorm(0,1/10^2)+ zsigma ~ dunif(1.0E-3, 1.0E+3)+ }
i
No error bars, data with real dispersion Ex: Faber-Jackson relation Data from Schechter (1980)
3 parameters: slope, intercept and dispersion
MCMC
The results are the full pdf’s of the parameters, not just mean expected values with error bars which assume a particular distribution
Linear regression
+ model{+ for ( i in 1:Ntotal ) {+ zobsy[i] ~ dnorm(zy[i], 1/zerry[i]^2)+ zy[i] ~ dnorm(mu[i], 1/zsigma^2)+ mu[i] <- zbeta0+zbeta1*zx[i]+ }+ }
Error bars in Y plus actual dispersion
Typical case in astrophysics and many sciences. Ej: mass-age relation for E galaxies
8>><
>>:
Sin errores : �1 = 0.38± 0.09Pesando (lm) : �1 = 0.59± 0.08Pesando (lm erry) : �1 = 0.59± 0.15MCMC : �1 = 0.41± 0.10
In this case, the intrinsic actual dispersion (σ = 0.144 ± 0.023) dominates over the typical error (0.037). The classic method of weighted least squares overestimates the weight of the data with the smallest error bars. It allows you to decide if you have a real dispersion
3 parameters: slope, intercept and dispersion
MCMC
No errors Error weighted MCMC
Linear regression
Error bars in X and Y plus actual dispersion Ej: relation between galaxy velocity dispersion
and mass of the central black holes
It is necessary to specify a prior distribution for the data in X. It can be a uniform distribution or, for instance, a Gaussian with hyperpriors, etc.
3 parameters: slope, intercept and dispersion
Data from Tremaine (2002)
⇢�0 = �1.10± 0.80�1 = 4.01± 0.35
� = 0.326± 0.056
+ model{+ for ( i in 1:Ntotal ) {+ zobsx[i] ~ dnorm(zx[i], 1/zerrx[i]^2)+ zobsy[i] ~ dnorm(zy[i], 1/zerry[i]^2)+ zx[i] ~ dunif(-1.E3,1.E3)+ zy[i] ~ dnorm(mu[i], 1/zsigma^2)+ mu[i] <- zbeta0+zbeta1*zx[i] + }+ zbeta0 ~ dnorm(0,1/10^2)+ zbeta1 ~ dnorm(0,1/10^2)+ zsigma ~ dunif(1.0E-3, 1.0E+3)+ }
Clear actual dispersion
Piece of cake with MCMC!
MCMC
This is the most usual situation
Markov-Chain Monte Carlo methods Linear regression
●
●
●
●
●●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
−5 0 5
−50
5
HdA interpolated
HdA
mea
sure
d
α−0.2 0.0 0.2 0.4 0.6
mean = 0.17
7.3% < 0 < 92.7%
95% HDI−0.0598 0.402
β0.85 0.90 0.95 1.00 1.05 1.10
mean = 0.98
72.1% < 1 < 27.9%
95% HDI0.912 1.05
σ1.0 1.1 1.2 1.3 1.4 1.5 1.6
mean = 1.28
99.6% < 1.51 < 0.4%
95% HDI1.13 1.43
Comparison of measured line-strength indices with those derived interpolating in the previous library for the corresponding stellar parameters
Linear regression with errors in X and Y plus real dispersion. MCMC is the only correct approach → Posterior probability distribution for the intercept, slope and dispersion. 95% Highest Density Intervals (HDI) Program in JAGS
model{for (i in 1:Ntotal) { zobx[i] ~ dnorm(zx[i], 1/zerrx[i]^2) zoby[i] ~ dnorm(zy[i], 1/zerry[i]^2) zx[i] ~ dnorm(zmux, 1/zsigx^2) T(zxmin,zxmax) mu[i] <- zalpha + zbeta*zx[i] zy[i] ~ dnorm(mu[i], 1/zsigma^2)}zalpha ~ dnorm(0, 1/10^2)zbeta ~ dnorm(1, 1/10^2)zsigma ~ dunif(1E-5, 1.E+3)zmux ~ dnorm(0, 1)zsigx ~ dunif(0, 100)}
Another example: Extension of the MILES library
MCMC
Markov-Chain Monte Carlo methods Linear regressionMore advantages: hierarchical models
FIGURES 467
Figure 17.6: A model of dependencies for robust hierarchical linear regression. Com-pare with Figure 17.2 on p. 463. Copyright © Kruschke, J. K. (2014). DoingBayesian Data Analysis: A Tutorial with R, JAGS, and Stan. 2nd Edition. AcademicPress / Elsevier.
For a sample with subgroups: to fit a straight line for each subgroup but relate all the intercepts and slopes through a higher order distribution (ex. color-magnitud relations for galaxies in different clusters)
466 FIGURES
40 50 60 70 80 90
50100
150
200
250
All Units
X
Y
50 70 90
50150
250
Unit: 1
X
Y
50 70 90
50150
250
Unit: 2
X
Y
50 70 90
50150
250
Unit: 3
X
Y
50 70 90
50150
250
Unit: 4
X
Y
50 70 90
50150
250
Unit: 5
X
Y
50 70 90
50150
250
Unit: 6
X
Y
50 70 90
50150
250
Unit: 7
X
Y
50 70 90
50150
250
Unit: 8
X
Y
50 70 90
50150
250
Unit: 9
X
Y
50 70 90
50150
250
Unit: 10
X
Y
50 70 90
50150
250
Unit: 11
X
Y
50 70 90
50150
250
Unit: 12
X
Y
50 70 90
50150
250
Unit: 13
X
Y
50 70 90
50150
250
Unit: 14
X
Y
50 70 90
50150
250
Unit: 15
X
Y
50 70 90
50150
250
Unit: 16
X
Y
50 70 90
50150
250
Unit: 17
X
Y
50 70 90
50150
250
Unit: 18
X
Y
50 70 90
50150
250
Unit: 19
X
Y
50 70 90
50150
250
Unit: 20
X
Y
50 70 90
50150
250
Unit: 21
X
Y
50 70 90
50150
250
Unit: 22
X
Y
50 70 90
50150
250
Unit: 23
X
Y
50 70 90
50150
250
Unit: 24
X
Y
50 70 90
50150
250
Unit: 25
X
Y
Figure 17.5: Fictitious data for demonstrating hierarchical linear regression, with pos-terior predicted lines superimposed. Upper panel: All data together, with individualsrepresented by connected segments. Lower panels: Plots of individual data. Noticethat the final two subjects have only single data points, yet the hierarchical model hasfairly tight estimates of the individual slopes and intercepts. Copyright© Kruschke,J. K. (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. 2ndEdition. Academic Press / Elsevier.
466 FIGURES
40 50 60 70 80 90
50100
150
200
250
All Units
X
Y
50 70 9050
150
250
Unit: 1
X
Y
50 70 90
50150
250
Unit: 2
X
Y
50 70 90
50150
250
Unit: 3
X
Y
50 70 90
50150
250
Unit: 4
X
Y
50 70 90
50150
250
Unit: 5
X
Y
50 70 90
50150
250
Unit: 6
X
Y
50 70 9050
150
250
Unit: 7
X
Y
50 70 90
50150
250
Unit: 8
X
Y
50 70 90
50150
250
Unit: 9
X
Y
50 70 90
50150
250
Unit: 10
X
Y
50 70 90
50150
250
Unit: 11
X
Y
50 70 90
50150
250
Unit: 12
X
Y
50 70 9050
150
250
Unit: 13
X
Y
50 70 90
50150
250
Unit: 14
X
Y
50 70 90
50150
250
Unit: 15
X
Y
50 70 90
50150
250
Unit: 16
X
Y
50 70 90
50150
250
Unit: 17
X
Y
50 70 90
50150
250
Unit: 18
X
Y
50 70 9050
150
250
Unit: 19
X
Y
50 70 90
50150
250
Unit: 20
X
Y
50 70 90
50150
250
Unit: 21
X
Y
50 70 90
50150
250
Unit: 22
X
Y
50 70 90
50150
250
Unit: 23
X
Y
50 70 90
50150
250
Unit: 24
X
Y
50 70 9050
150
250
Unit: 25
X
Y
Figure 17.5: Fictitious data for demonstrating hierarchical linear regression, with pos-terior predicted lines superimposed. Upper panel: All data together, with individualsrepresented by connected segments. Lower panels: Plots of individual data. Noticethat the final two subjects have only single data points, yet the hierarchical model hasfairly tight estimates of the individual slopes and intercepts. Copyright© Kruschke,J. K. (2014). Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. 2ndEdition. Academic Press / Elsevier.
Note that each data point informs not only to the straight line of its particular group, but also to all the others regression lines. There is an important effect of “shrinkage”: The regression lines of each group are much better defined than in individual fits. The net effect is to increase the signal-to-noise. You can even fit a straight line to one single point!
MCMC
Markov-Chain Monte Carlo methods Linear regression
Hierarchical fits to derive age and metallicity gradients
Galaxies from CALIFA The intercepts and slopes form a higher order distribution t-distributions instead of normal ones. Programmed in JAGS
Ex. Stellar population gradients in the disks of spiral galaxies
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
9.6
9.7
9.8
9.9
10.0
r (hs)
log
(age
) (LW
)
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
IC4566 Weighted least-squares MCMC Non hierarchical MCMC Hierarchical fit
β0
−0.16 −0.14 −0.12 −0.10 −0.08 −0.06
mean = −0.112
95% HDI−0.138 −0.0876
β1
−0.045 −0.040 −0.035 −0.030 −0.025 −0.020 −0.015
mean = −0.0303
95% HDI−0.0375 −0.0231
0.00 0.05 0.10 0.15 0.20
0.00
0.05
0.10
0.15
0.20
ebeta1ind
ebeta1adj
errors in the slopes
hier
arch
ical
non hierarchical
MCMC
●
●
●
●
●
●
●
●
●
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
−0.3
−0.2
−0.1
0.0
0.1
0.2
r (hs)
[Z/H
] (LW
)
●
●
●
●
●
●
●
●
●
NGC5378
Markov-Chain Monte Carlo methods Linear regression
If you have different kind of objects (ex. galaxy types) you can compare the high order distributions for each kind.
Ex. Stellar population gradients in the disks of spiral galaxies
MCMC
Sb vs Sc−Sdm
∆(β1)−0.10 −0.05 0.00 0.05
mean = −0.0315
97.2% < 0 < 2.8%
95% HDI−0.0645 0.000795
E−S0 S0a−Sab Sb Sbc Sc−Sdm
−0.5
−0.4
−0.3
−0.2
−0.1
0.0
0.1
0.2
LW metallicity gradients
Sb vs Sc−Sdm
∆(β1)−0.15 −0.10 −0.05 0.00
mean = −0.0599
99.7% < 0 < 0.3%
95% HDI−0.101 −0.0161
E−S0 S0a−Sab Sb Sbc Sc−Sdm
−0.3
−0.2
−0.1
0.0
0.1
0.2
LW age gradients
beta0mu[1]−0.10 0.00 0.10
mean = −0.0145
95% HDI−0.0709 0.0417
beta0mu[2]−0.15 −0.05 0.05
mean = −0.0378
95% HDI−0.0799 0.0047
beta0mu[3]−0.15 −0.05
mean = −0.0914
95% HDI−0.135 −0.0484
beta0mu[4]−0.25 −0.15
mean = −0.158
95% HDI−0.198 −0.116
beta0mu[5]−0.5 −0.3
mean = −0.318
95% HDI−0.402 −0.239
beta1mu[1]−0.06 −0.02 0.02
mean = −0.0185
95% HDI−0.0385 0.00152
beta1mu[2]−0.06 −0.03 0.00
mean = −0.0336
95% HDI−0.0473 −0.0196
beta1mu[3]−0.06 −0.04 −0.02
mean = −0.032
95% HDI−0.0441 −0.0202
beta1mu[4]−0.05 −0.03 −0.01
mean = −0.0242
95% HDI−0.0355 −0.0131
beta1mu[5]−0.08 −0.02 0.04
mean = −0.00355
95% HDI−0.0319 0.0249
Note that you don’t make the comparison of the final mean
values but using all the steps of the MCMC chains → probability
distribution of the differences among types
Markov-Chain Monte Carlo methods Linear regressionAnd more advantages: different error models
You don’t need to assume Gaussian errors (in X or Y). You can work with different distributions
A t-distribution for the errors allows you to carry out a robust regression, in which outliers will have a much lower effect in the derived parameters. The degrees of freedom of the t-distribution is then an additional parameter to derive.
Towards generalized linear models:Log-normal models: y > 0 ; Var(y) ∝ E(y)2 Beta models: 0 < y < 1 Gamma models Inverse Gaussian models
Log-normal models Beta models
+ logistic regression, binomial regression, beta-binomial regression, Poissonian
models, negative binomial models, etc.
MCMC
Linear regressionSimple linear regression fits: comparison of methods
See Andreon & Hurn (2013, Statistical Analysis and Data Mining, 6, 15) for a description of different methods (ordinary least squares, weighted least squares, symmetrical fits (bisector, orthogonal regression, EMR, etc), maximum likelihood estimations (MLE), robust regression, BCES (Bivariate Correlated Errors and intrinsic Scatter), survival analysis, bayesian methods, etc.
There are many potential complications when performing simple regression fits: heteroscedasticy in the errors, intrinsic actual dispersion, selection effects, complex data structures, non-uniform populations (Malmquist bias), non-Gaussianity, outliers, etc. First: Why are you making the fit?: parameter estimation, prediction, or model selection? The procedures and difficulties are different in each case.
Comparative analysis of some methods: Ordinary least-squares (OLS Y|X) OLS bisector Bayesian procedure with MCMC
in their usefulness to: 1) estimate the straight line parameters , and 2) make predictions (for y, having observed x). (see Andreon & Weaver for results about BCES y MLE)
Simulated data: values of x and y following the relation:
y = x
with x following an uniform distribution in the (−3, 3) interval and with known measurement errors (σx = σy = 1) in x and in y. Without actual dispersión.
Comparison of methods
Linear regression+ model{+ x ~ dunif(-3,3)+ y <- x + 0+ obsx ~ dnorm(x,1)+ obsy ~ dnorm(y,1)+ } "
We generate 10000 data pairs, from which we are going to use 100 random pairs to make the fits and the rest to check the predictions (data are created using JAGS)
in red: actual values in black: observed values
y vs obsxobsy vs x
Comparison of methods
Linear regressionNote that for a small range of values in x (around obsx), the observed values of y (obsy) are not symmetrical distributed around obsy = obsx, There is an important bias toward lower obsy values.
Ejemplo:
< obsy >= 2.2
< obsx >= 3.0
For a high value of observed x it is more likely to obtain an observed y value below the actual one (and the other way around) (kind of Malmquist bias)
This bias occurs for any value of obsx, being more important in the extremes of the distribution.
Repeating this for all values of obsx:
<obsy> for each obsx.
Comparison of methods
Linear regressionLinear regression:
+ model{+ for (i in 1:length(obx)) {+ x[i] ~ dunif(-3,3)+ y[i] <- alpha + beta*x[i]+ obx[i] ~ dnorm(x[i],1)+ oby[i] ~ dnorm(y[i],1)+ }+ alpha ~ dnorm(0.,1.E-4)+ beta ~ dt(0,1,1)+ } "> jags.params <- c("alpha","beta","oby")> obx <- obsx> oby <- rep(NA,length(obsy))> for (i in 1:100){+ oby[i*100] <- obsy[i*100]+ }> dataList <- list(obx = obx, oby=oby)
Bayesian fit using MCMC
Results are compatible with the theoretical relation ( )y = 0 + 1x
We fit a straight line to a subsample of 100 random points. For the rest we set the observed y values obsy to NA (Not Available) and we make predictions from their observed obsx values. A fit with ~9902 parameters!
Ex: predicted distribution for
the y value corresponding to obsx(5347)
y = (0.20± 0.13) + (1.09± 0.07)x
Comparison of methods
Linear regression
Ordinary least-squares fit (OLS Y|X)
y = (0.24± 0.12) + (0.84± 0.06)x
Bisector OLS fit
y = (0.26± 0.42) + (1.04± 0.06)x
Bayesian fit (MCMC)y = (0.20± 0.13) + (1.09± 0.07)x
OLS Y|X underestimates the slope of the theoretical straight line (y = 0 + 1x) whilst Bisector OLS and the Bayesian fits recover the input value.
Linear regression:
Comparison of methods
Linear regressionPrediction
OLS Y|X OLS bisector MCMC
We check the predictive power of the three fits using the 9900 (10000 − 100) points not used to derive the regression parameters. For OLS Y|X and Bisector OLS we used directly the fitted straight lines. MCMC computes the predicted values as additional parameters of the model, not using the fitted straight line. We plot the residuals: observed − predicted versus the observed x values
The Bisector OLS bisector method provides quite biased results (same for all symmetric methods), the classic OLS Y|X works better but is also biased. The Bayesian MCMC method does not introduce any significant bias. Non-Bayesian fitting methods are not valid to make predictions. The problem is that the relation between y and obsx is not linear: The application of a linear regression to make predictions can never work. This is avoided in MCMC, which derives the linear regression between the real values and does not use this derived linear regression to make predictions.
Comparison of methods
Linear regression
Conclusions:
old-fashioned ordinary least-squares stink
Use MCMC!
Linear regression References
Doing Bayesian Data Analysis, A Tutorial with R, JAGS and Stan, 2nd edition, John K. Kruschke, 2015, Elsevier
Bayesian Methods for the Physical Sciences, Learning from Examples in Astronomy and Physics, Stefano Andreon & Brian Weaver, 2015, Springer Series in Astrostatistics
The BUGS Book, A Practical Introduction to Bayesian Analysis, David Lunn et al., 2013, Texts in Statistical Science, CRC Press
C8490
Bayesian statistical methods have become widely used for data analysis and modelling in recent years, and the BUGS software has become the most popular software for Bayesian analysis worldwide. Authored by the team that originally developed this software, The BUGS Book provides a practical introduction to this program and its use. The text presents complete coverage of all the functionalities of BUGS, including prediction, missing data, model criticism, and prior sensitivity. It also features a large number of worked examples and a wide range of applications from various disciplines.
The book introduces regression models, techniques for criticism and comparison, and a wide range of modelling issues before going into the vital area of hierarchical models, one of the most common applications of Bayesian methods. It deals with essentials of modelling without getting bogged down in complexity. The book emphasises model criticism, model comparison, sensitivity analysis to alternative priors, and thoughtful choice of prior distributions—all those aspects of the “art” of modelling that are easily overlooked in more theoretical expositions.
More pragmatic than ideological, the authors systematically work through the large range of “tricks” that reveal the real power of the BUGS software, for example, dealing with missing data, censoring, grouped data, prediction, ranking, parameter constraints, and so on. Many of the examples are biostatistical, but they do not require domain knowledge and are generalisable to a wide range of other application areas.
Full code and data for examples, exercises, and some solutions can be found on the book’s website.
Lunn, Jackson, Best, Thom
as, and SpiegelhalterThe BUGS Book
Statistics
David LunnChristopher Jackson
Nicky BestAndrew Thomas
David Spiegelhalter
The BUGS BookA Practical Introduction to
Bayesian Analysis
Texts in Statistical Science
C8490_Cover.indd 1 8/22/12 3:38 PM
Bayesian Models for Astrophysical Data using R, JAGS, Python and Stan, Joseph M. Hilbe, Rafael S. de Souza and Emille E.O. Ishida, 2017, Cambridge University Press
Statistical Rethinking, Richard McElreath, 2016, Texts in Statistical Science, CRC Press
Bayesian Modeling Using Win BUGS, Ioannis Ntzoufras, 2009, Wiley Series in Computational Statistics, Wiley
Bayesian Data Analysis, 3rd edition, Andrew Gelman et al., 2014, Texts in Statistical Science, CRC Press
Bayesian Computation with R, 2nd edition, Jim Albert, 2009, Use R!, Springer
See many examples in the OpenBUGS User Manual (http://www.openbugs.net/Manuals/Manual.html)
References
top related