bayesian inference and the parametric...

32
Bayesian Inference and the Parametric Bootstrap Bradley Efron Stanford U niversity

Upload: nguyenminh

Post on 29-Apr-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Bayesian Inference andthe Parametric Bootstrap

Bradley Efron

Stanford University

Importance Sampling for Bayes Posterior Distribution

• Newton and Raftery (1994 JRSS-B)

“Nonparametric Bootstrap: good choice”

(actually used a smoothed nonparametric bootstrap)

• Today Parametric Bootstrap

– Good computational properties when applicable

– Connection between Bayes and frequentist inference

Bayesian Inference 1

Student Score Data

(Mardia, Kent and Bibby)

• n = 22 students’ scores on two tests: mechanics, vectors

• Data y = (y1, y2, . . . , y22) with yi = (meci, veci)

• Parameter of interest θ = correlation (mec, vec)

• Sample correlation coefficient θ = 0.498± ??

Bayesian Inference 2

●●

●●

0 20 40 60

3040

5060

70

Scores of 22 students on two tests 'mechanics' and 'vectors';Sample Correlation Coefficient is .498 +−??

Sampled from 88 students, Mardia, Kent and Bibbymechanics score

vect

ors

scor

e

Bayesian Inference 3

R. A. Fisher (1915)

• Density fθ(θ)

=n − 2π

(1−θ2)(n−1)/2(1 − θ2

)(n−4)/2∫∞

0

[cosh(w) − θθ

]−(n−1)dw

• Bivariate normal yiind∼ N2(µ, /Σ) i = 1, 2, . . . ,n

• 95% confidence limits (from z transform)

θ ∈ (0.084, 0.754)

Bayesian Inference 4

Fisher density for correlation coefficient if corr=.498And histogram for 10000 parametric bootstrap replications

Red triangles are Fisher 95% confidence limitscorrelation

Fre

quen

cy

−0.2 0.0 0.2 0.4 0.6 0.8 1.0

010

020

030

040

050

060

0

corrhat=.498.084 .754

0.5−

1 −

1.5−

2 −

2.5−

density

Bayesian Inference 5

Parametric Bootstrap Computations

• Assume yiind∼ N2(µ, /Σ) • Estimate µ and /Σ by MLE

• Bootstrap sample y∗iind∼ N2

(µ, /Σ

)for i = 1, 2, . . . , 22

• Bootstrap replication θ∗ = corr coeff for y∗ = (y∗1, y∗

2, . . . , y∗

22)

• Bootstrap standard deviation

sd(θ)

= 0.168 (B = 10, 000)

• Percentile 95% confidence limits (0.109, 0.761)

Bayesian Inference 6

Better Bootstrap Confidence Limits (BCa)

• Idea Reweighting the B bootstrap replications

improves coverage

• Weights involve z0 (bias correction) and a (acceleration)

• WBCa(θ) =ϕ(Zθ/(1 + aZθ) − z0)(1 + aZθ)2ϕ(Zθ + z0)

where Zθ = Φ−1G(θ) − z0

bootstrap cdf

• Reweighting the B = 10, 000 bootstraps

gave weighted percentiles

(0.075, 0.748) Fisher: (0.084, 0.754)

Bayesian Inference 7

Bayes Posterior Intervals

• Prior π(θ) • Posterior expectation for t(θ):

E{t(θ)|θ

}=

∫t(θ)π(θ) fθ

(θ)

dθ/∫

π(θ) fθ(θ)

• Conversion factor Ratio of likelihood to bootstrap density:

R(θ) = fθ(θ) /

fθ(θ)(“θ” = θ∗

)• Bootstrap integrals

E{t(θ)|θ

}=

∫t(θ)π(θ)R(θ) fθ(θ) dθ∫π(θ)R(θ) fθ(θ) dθ

Bayesian Inference 8

Bootstrap Estimation of Bayes Expectation E{t(θ)|θ}

• Parametric bootstrap replications fθ(·)→ θ1, θ2, . . . , θi, . . . , θB

• ti = t(θi), πi = π(θi), Ri = R(θi):

E{t(θ)|θ

}=

B∑i=1

tiπiRi

/ B∑i=1

πiRi

• Reweighting Weight πiRi on θi

• Importance sampling estimate:

E{t(θ)|θ

}→ E

{t(θ)|θ

}as B→∞

Bayesian Inference 9

Jeffreys Priors

• Harold Jeffreys (1930s):

Theory of “uninformative” prior distributions

(“invariant”, “objective”, “reference”, . . . )

• For density family fθ(y), θ ∈ Rp:

π(θ) = c|I(θ)|1/2 I(θ) = Fisher info matrix

• Normal correlation π(θ) = 1/(1 − θ2)

Bayesian Inference 10

−0.2 0.0 0.2 0.4 0.6 0.8

0.0

0.5

1.0

1.5

2.0

2.5

Importance sampling posterior density for correlation (black)using Jeffreys prior pi(theta)=1/(1−theta^2);

BCa weighted density (green); raw bootstrap density (red)

triangles show 95% posterior limitscorrelation

wei

ghte

d co

unts

Bayesian Inference 11

Multiparameter Exponential Families RP

• p-dimensional sufficient statistic β

• p-dimensional parameter vector β = E{β}

(Poisson: e−λλx/x! has β = x, β = λ)

• Hoeffding fβ(β)

= fβ(β)

e−D(β,β)/2

• Deviance D(β1, β2) = 2Eβ1 log{

fβ1

(β) /

fβ2

(β)}

Bayesian Inference 12

Conversion Factor in Exponential Families

• Conversion factor R(β) = fβ(β) /

fβ(β) (with β fixed)

R(β) = ν(β)e∆(β)

where ∆(β) =[D

(β, β

)−D

(β, β

)] /2

• ν(β) = fβ(β) /

fβ(β) � 1/Jeffreys prior (Laplace)

• Jeffreys prior π(β)R(β) � e∆(β)

Bayesian Inference 13

Example: Multivariate Normal

• yiind∼ Nd(µ,Σ) for i = 1, 2, . . . ,n

• ∆ =

n

(µ − µ)′ /Σ−1− /Σ−1

2(µ − µ

)+

12

tr(/Σ/Σ−1− /Σ/Σ−1

)+ log

|/Σ|

|/Σ|

• “eigenratio” (student score data)

• θ =λ1

λ1 + λ2• θ =

λ1

λ1 + λ2= 0.793± ??

• Bootstrap y∗iind∼ Nd

(µ, /Σ

)i = 1, 2, . . . ,n→ θ1, θ2, . . . , θB

Bayesian Inference 14

0.5 0.6 0.7 0.8 0.9

01

23

45

67

Density estimates for student score eigenratio (B=10,000);Boot (red),Jeffreys (5−dim prior) Black, and BCa Green (z0=−.182, a=0)

Triangles show 95% limits: bca (.598,.890) bayes (.650,.908)eigenratio

dens

ity

BCa

Jeffreys

boot

Bayesian Inference 15

*

*

**

*

*

*

*

*

*

*

**

*

**

*

*

*

*

*

*

* *

*

*

*

*

*

**

*

*

*

**

*

*

**

*

*

**

*

***

*

**

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

*

*

*

*

*

*

*

*

* *

**

*

*

*

*

*

*

*

*

*

* **

*

*

*

*

*

*

*

*

*

**

*

**

*

*

*

**

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

**

*

*

**

*

*

**

**

*

*

*

**

**

*

*

** *

*

*

*

*

** *

*

*

*

*

*

*

*

*

*

*

*

**

*

*

*

*

** *

*

* *

*

*

*

**

* * *

*

*

*

*

*

*

*

* *

*

*

*

*

*

**

*

*

*

*

*

*

*

**

*

*

**

*

*

**

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

*

**

**

*

**

*

*

*

*

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

Reweighting the parametric bootstrap points

●bhat

exp(del)

Bayesian Inference 16

Prostate Cancer Study

(Singh et al, 2002)

• Microarray study

102 men: 52 prostate cancer, 50 healthy controls

• 6033 genes zi test statistic for H0i: “no difference”

H0i : zi ∼ N(0, 1)

• Goal Identify genes involved in prostate cancer

Bayesian Inference 17

Prostate cancer z−values for 6033 genes; 52 patients vs50 healthy controls. N(0,1) black; LogPoly4 red

z value

coun

ts

−4 −2 0 2 4

010

020

030

040

0

log poly4 fit

Bayesian Inference 18

Poisson Regression Models

• Histogram y j = #{zi ∈ bin j}

• x j = midpoint bin j, j = 1, 2, . . . , J

• Poisson model y jind∼ Poi(µ j)

with log(µ j) = poly(x j, degree = m)

• Exponential family degree p = m+1 • Let η j = log(µ j)

∆ = (η − η)′ (µ + µ) − 2(∑

µ j −∑µ j

)

Bayesian Inference 19

Parametric False Discovery Rate Estimates

• glm(y∼poly(x,4),Poisson) −→{µ j

}(“log poly 4”)

• θ = Fdr(3) = [1 −Φ(3)]/ [

1 − F(3)]≈ Pr{null|z ≥ 3}

(F is smoothed CDF estimate)

• Model 4: Fdr(3) = 0.192± ??

• Parametric bootstrap y∗jind∼ Poi

(µ j

)→

{µ∗j

}→ Fdr(3)∗

Bayesian Inference 20

0.15 0.20 0.25 0.30

05

1015

20

Fdr(3) estimation, Model 4; prostate data from 4000 parboots;Jeffreys Bayes (black), raw boot (red), bca (green)

triangles show 95% Jeffreys/BCa limits (.154, .241)Fdr(3)*

post

erio

r de

nsity

Bayesian Inference 21

Compare Model 4 (line) with Model 8 (solid);4000 parametric bootstrap replications of Fdr(3)

triangles are 95% bca limits: (.141,.239) vs (.154,.241)Fdr(3) bootreps

coun

ts

0.15 0.20 0.25 0.30

050

100

150

200

250

300

Bayesian Inference 22

Model Selection: AIC = Deviance + 2 · df

Model AIC boot counts Bayes counts†

M2 142.6 0 0.0

M3 143.1 0 0.0

M4 73.3 1266 1458 (36%)

M5 74.3 415 462 (12%)

M6 75.8 215 197 (5%)

M7 77.8 54 67 (2%)

M8 75.6 2050 1816 (45%)

† Model 8, Jeffreys prior

Bayesian Inference 23

Internal Accuracy of Bayes Estimates

• B for computing E{t(β)|β

}=

B∑1

tiπiRi

/ B∑1

πiRi ?

• Define ri = πiRi and si = tiπiRi • css =

B∑1

(si−s)2/B, etc.

CV{E}2

=1B

( css

s2 − 2csr

sr+

crr

r2

)• Example t = Fdr(3)

4000 parametric bootreps, Jeffreys Bayes

• Model 4 E = 0.193 CV = 0.0019

• Model 8 E = 0.179 CV = 0.0025

Bayesian Inference 24

Sampling Variability of Bayes Estimates

• How much would E{t(β)|β

}vary for new β’s?

(i.e., frequentist properties of Bayes estimates)

• Need to bootstrap the parametric bootstrap calculations

for E{t(β)|β

}• Shortcut “Bootstrap after bootstrap”

Bayesian Inference 25

Bootstrap-after-Bootstrap Calculations

• fβ(·) −→ β1, β2, . . . , βi, . . . , βB,

the original bootstrap reps under β

• Want bootreps under some other value γ

• Idea Reweight the originals

• Weight Qγ(β) = fβ(γ)/ fβ

(β):

E{t|γ

}=

B∑1

tiπiRiQγ(βi)/ B∑

1

πiRiQγ(βi)

Bayesian Inference 26

Bootstrap-after-Bootstrap Standard Errors

• BAB

fβ(·)→ γ1, γ2, . . . , γk(1)

Ek = E{t|γk

}(2)

se{E}

=[∑(

Ek − E·)2 /

K]1/2

(3)

• Example Model 4

97.5% credible limit for Fdr(3) = 0.241± ??

Bayesian Inference 27

400 BAB replications of 97.5% credible upper limit for Fdr(3);line histogram: same for BCa upper limit

BAB Standard Errors: .030 (Bayes) .034 (BCa); corr .9897.5% credible upper limit

Fre

quen

cy

0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32

010

2030

40

.240

Bayesian Inference 28

BAB SDs for Bayes Model Selection Proportions

BayesModel post. prob. BAB SD

M4 36% ± 20%

M5 12% ± 14%

M6 5% ± 8%

M7 2% ± 6%

M8 45% ± 27%

• Model averaging?

Bayesian Inference 29

A Few Conclusions . . .

• Parametric bootstrap closely related to objective Bayes.

(That’s why it’s a good importance sampling choice.)

• When it applies, parboot approach has both computational

and interpretational advantages over MCMC/Gibbs.

• Objective Bayes analyses should be checked frequentistically.

Bayesian Inference 30

References

• Bootstrap DiCiccio & Efron 1996 Statist. Sci. 189–228

• Bayes and Bootstrap Newton & Raftery 1994 JRSS-B

3–48 (see Discussion!)

• Objective Priors Berger & Bernardo 1989 JASA 200–207;

Ghosh 2011 Statist. Sci. 187–202

• Exponential Families Efron 2010 Large-Scale Inference,

Appendix 1

• MCMC and Bootstrap Efron 2011 J. Biopharm. Statist.

1052–1062

Bayesian Inference 31