@let@token lecture 5 and 6: bootstrap - tu...

Introduction Two stages approach Consistency Confidence Interval Assignments

Lecture 5 and 6: Bootstrap

Applied Statistics 2015

1 / 37


Bootstrap, firstly introduced in Efron (1979), is a resampling methodoften used to find

1 standard errors for estimators

2 confidence intervals for unknown parameters

3 p-values for test statistics under a null hypothesis

2 / 37


An example

Let X1, . . . , Xn be a random sample from distribution F with mean θ. LetXn be an estimator of θ.

Question: What is the sampling distribution of Xn?

If we knew that F = N(θ, 1), then Xnd= N(θ, 1/n).

If we don’t know the distribution, but we could draw many samples ofsize n from F . Then we have {Xn1, . . . , Xnm} that is considered asa random sample of Xn. The empirical distribution function based onthis sample is then a good approximation of the distribution of Xn.

What if we don’t know the distribution, and we can only afford onerandom sample?

3 / 37


The bootstrap idea

The sample stands in for population and we do many times re-samplingfrom the sample.

4 / 37


SRS of size n

(a)

SRS of size n

SRS of size n

Sampling distributionPOPULATION

unknown mean �

x–

x–

x–

···

···

(b)

Theory

Sampling distribution NORMAL POPULATIONunknown mean �

�

�

�/0_

n

Resample of size n

Resample of size n

Resample of size n

(c)

One SRS of size n

Bootstrap distributionPOPULATION

unknown mean �

x–

x–

x–

···

···

FIGURE 14.4 (a) The idea of the sampling distribution of the sample mean x: take verymany samples, collect the x-values from each, and look at the distribution of these values.(b) The theory shortcut: if we know that the population values follow a normal distribution,theory tells us that the sampling distribution of x is also normal. (c) The bootstrap idea: whentheory fails and we can afford only one sample, that sample stands in for the population, andthe distribution of x in many resamples stands in for the sampling distribution.

14-9

5 / 37


Stage 1: A functional point of view

Let X1, . . . , Xn i.i.d. from F and Tn = g(X1, . . . , Xn) be an estimator ofθ. Often, θ is a functional of F . Following are some examples.

Quantities

1 θ = E(X1) =∫xdF (x)

2 θ = med(X1) = F−1(12

)3 θ = supx∈R |F (x)− F0(x)|, for a given cdf F0.

Estimators

1 θn = 1n

∑ni=1Xi =

∫xdFn(x)

2 θn = X(m) = F−1n

(12

), where n = 2m− 1, for convenience.

3 θ = supx∈R |Fn(x)− F0(x)|

Put θ = h(F ). The estimator is obtained by plugging Fn in the

function h: θ = h(Fn)

6 / 37



Measures of performance of θn

(A) λn(F ) = PF (√n(θn − h(F )) ≤ a)

(B) λn(F ) = PF

(√n(θn−h(F ))τ(F ) ≤ a

), for some scaling factor τ(F ).

(C) λn(F ) = EF (θn − h(F ))

(D) λn(F ) = VarF (θn − h(F ))

The task is to develop a procedure for estimating λn(F ). The idea issimilar to how we derive the estimator of θ. The estimator of λn(F ) isobtained by plugging Fn. For instance,

(A) λn(Fn) = PFn(√n(θ∗n − h(Fn)) ≤ a),

where θ∗n = T (X∗1 , . . . , X∗n) and X∗i iid from Fn. Here, h(Fn) = θn

is a parameter in bootstrap space.

7 / 37




(A) λn(F ) = PF (√n(θn − h(F )) ≤ a)

(B) λn(F ) = PF

(√n(θn−h(F ))τ(F ) ≤ a


(C) λn(F ) = EF (θn − h(F ))


The task is to develop a procedure for estimating λn(F ).

The idea issimilar to how we derive the estimator of θ. The estimator of λn(F ) isobtained by plugging Fn. For instance,




7 / 37




(A) λn(F ) = PF (√n(θn − h(F )) ≤ a)

(B) λn(F ) = PF

(√n(θn−h(F ))τ(F ) ≤ a


(C) λn(F ) = EF (θn − h(F ))


The task is to develop a procedure for estimating λn(F ). The idea issimilar to how we derive the estimator of θ. The estimator of λn(F ) isobtained by plugging Fn. For instance,




7 / 37


An example: estimating the error distribution of the mean

Let θ = h(F ) = EF (X) and θn = h(Fn) = Xn. Consider a very simplescenario n = 2. Suppose that the realization of (X1, X2) is (c, d). How to

compute the estimated cdf of error: λn(Fn) = PFn(√n(θ∗n−h(Fn)) ≤ a)?

Let (X∗1 , X∗2 ) be a random sample from Fn. Then

P(X∗i = c) = P(X∗i = d) = 1/2, for i = 1, 2.

Note h(Fn) = c+d2 and θ∗n =

X∗1 +X∗

2

2 .

Prob. 14

12

14

(X∗1 , X∗2 ) (c, c) (c, d) or (d, c) (d, d)

θ∗n c c+d2 d

θ∗n − θn c−d2 0 d−c

2

The last row gives cdf of (θ∗n − θn), i.e. λn(Fn).

8 / 37


Stage 2: Resampling

In theory, λn(Fn) is known to us because the randomness is governedcompeletely by Fn, which only depends on the data (X1, . . . , Xn).

In practice, the exact calculation of λn(Fn) is not feasible, except forsmall values of n.

Efron (1979) provides a way to estimate λn(Fn) by sampling fromFn.

9 / 37


Stage 2: Sampling from Fn

We aim to estimateλn(Fn) = PFn(

√n(T (X∗1 , . . . , X

∗n)− θn) ≤ a) = PFn(

√n(θ∗n− θn) ≤ a).

Draw B samples, each with sample size n, from Fn.

Note that Fn is a discrete uniform df, assigning probability mass 1/nto each Xi.

for i in 1:B

draw X*_1,..., X*_n with replacement from {X_1,...,X_n}

compute theta*_i=T(X*_1,...,X*_n)

Resulting vector {θ∗1n, . . . , θ∗Bn} is a random sample of θ∗n.

Now λn(Fn) can be approximated by its empirical counterpart:

λ∗B,n =1

B

B∑i=1

I(√n(θ∗ni − θn) ≤ a)

10 / 37


Stage 2: Sampling from Fn

Similarly, λn(Fn) = EFn

(T (X∗1 , . . . , X

∗n)− θn

)can be estimated by

λ∗B,n = 1B

∑Bi=1

(θ∗ni − θn

).

11 / 37


Remarks

This is a two-stage procedure, called bootstrapping.

(i) estimating λn(F ) by λn(Fn)(ii) approximating λn(Fn) by λ∗B,n

The approximation in the second stage can be highly accurate bychoosing B sufficiently large. Thus λ∗B,n is an approximator rather

than an estimtor of λn(Fn).

There are many alternatives being available to each of the two stages.

Plugging a different estimator of F to estimate λn(F ), for instance aparametric estimator when dealing with some parametric family.Resampling m observations from Fn, with m = o(n). This is calledm out of n bootstrap.

12 / 37


Remarks

There are two sources of random variation in bootstrap distributions orbootstrap samples.

Choosing an original sample at random from the population. (Stage1)

Choosing bootstrap resamples at random from the original sample.(Stage 2)

Again, variation due to the first stage is dominating.

13 / 37


–3 µ µ

µ

3 30 06

0 x

x

x x3 3 30 0

Sample 1

0 03 3 0 3x x

Sample 2

0 0 0x x x3 3 3

Sample 3

0 0 0x x x3 3 3

Sample 4

0 0 0x x x3 3 3

Sample 5

Population distribution Sampling distribution

Bootstrap distribution 6for

Sample 1

Bootstrap distributionfor

Sample 1

Bootstrap distributionfor Sample 2


Sample 3


Sample 4

Bootstrap distributionfor Sample 5


Sample 1


Sample 1


Sample 1


Sample 1

Population mean =Sample mean = x–

––

–

–

–

– – –

– –

– –

– –

–

FIGURE 14.12 Five random samples (n = 50) from the same population, with a bootstrapdistribution for the sample mean formed by resampling from each of the five samples. At theright are five more bootstrap distributions from the first sample.

14-28

14 / 37


Remarks

One may feel that bootstrapping achieves the impossible: provideadditional information (about λn(F )) without acquiring more data.This is NOT true. What λ∗B,n does is to provide a simple and accu-

rate approximation to λn(Fn) when the latter is too complicated tocompute directly.

15 / 37


Does bootstrapping work?

DefinitionLet ρ be a metric on the space of cdfs, so it measures the distance of twocdfs. We say that the bootstrap is consistent under ρ if, as n→∞,

ρ(λn(F ), λ∗B,n)P→ 0.

Kolmogorov metric Let F1 and F2 be two cdfs.

K(F1, F2) = supx∈R|F1(x)− F2(x)|;

16 / 37


Consistency

Theorem (functions of mean)

Write µ1 = E(X1). Let θ = g(µ1) and θn = g(Xn). Suppose thatE(X2

1

)<∞ and g is continuously differentiable at µ1. Then, as n→∞,

K(λ(Fn), λ∗B,n)a.s.→ 0,

where λn(F ) = PF (√n(g(Xn)− θ) ≤ x) and

λ∗B,n =#{i :

√n(g

(X∗i)− θn) ≤ x}

B,

with X∗i is average of the i-th bootstrap sample, i.e. X∗i = 1n

∑ni=1X

∗i,1

17 / 37


Consistency

Theorem (quantiles)

For 0 < p < 1, let θ = F−1(p) and θ = F−1n (p). Suppose that F has a

positive derivative at θ. Then, as n→∞,

K(λ(Fn), λ∗B,n)a.s.→ 0,

where λn(F ) = PF (√n(F−1

n (p)− θ) ≤ x) and

λ∗B,n =#{i :

√n(X∗i,(np) − θn) ≤ x}

B,

with X∗i,(np) the p-th sample percentile of the i-th bootstrap sample

(X∗11, . . . , X∗1n).

18 / 37


Failure of the bootstrapSuppose X1, . . . , Xn is a random sample from a distribution F and thatX1 has mean µ and unit variance. Let θ = |µ| and θn = |Xn|. If µ = 0,then the bootstrap is not consistent for estimating the ditribution ofEn =

√n(|Xn| − |µ|).

If µ = 0, End→ |Z| where Z ∼ N(0, 1).

It can be shown that

(√n(Xn − µ),

√n(X∗n − Xn))

d→ (Z1, Z2),

where Z1 and Z2 are independent N(0, 1) random variables.

Since

E∗n =√n(|X∗n| − |Xn|) = |

√n(X∗n − Xn) +

√nXn| − |

√nXn|

d→ |Z1 + Z2| − |Z1|.

E∗n does not converge to the absolute value of a standard normalrandom variable.

19 / 37


Failure of the bootstrap


Den

sity

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

limit distr T*limit distr T

Here T should be read as En and T ∗ as E∗n.

20 / 37



It does not always work. Following are a few situations where the simplebootstrap fails to estimate the CDF of En consistently:

En =√n(Xn − µ) when Var(X1) =∞

En =√n(g(Xn)− g(µ)) and g′(µ) = 0

En =√n(g(Xn)− g(µ)) and g′(µ) does not exist.

En =√n(F−1

n (p)− F−1(p)) and F ′(F−1(p)) = 0 or F has unequalright and left derivatives at F−1(p).

The underlying population Fθ is indexed with a parameter θ, andthe support of the Fθ depends on the value of θ.

Some problems might be solved by more advanced bootstrap procedures.

21 / 37


There are several ways to construct bootstrap confidence intervals for θ.

Normal interval

Pivotal interval (or, bootstrap basic interval)

Percentile interval

Bias corrected interval

Studentised pivotal interval (or, bootstrap-t interval)

Let {θ∗1n, . . . , θ∗Bn} be the bootstrap sample; see Slide 10. Denote the

ordered sample by θ∗(1) ≤ θ∗(2) ≤ . . . ≤ θ

∗(B).

In this section, Φ denotes the CDF of N(0, 1) and Φ(zα) = α.

22 / 37


Normal intervalAssumption: θn−θσn

d→ N(0, 1), where σ2n = Var(θn).

If σn is known, then the (1 − α) CI of θ is given by [θn ± zα/2σ],where zα/2 is the quantile of N(0, 1).

If σn is unknown, then we need to estimate it by bootstrapping. Notethis corresponds to (D) on Slide 7. That is

λn(F ) = VarF (θn) = EF (θn − EF (θn))2.

Stage 1: λn(Fn) = EFn(θ∗n − EFn(θ∗n))2,

Stage 2: λ∗B,n =1

B

B∑i=1

(θ∗in − θ∗n)2, where θ∗n =1

B

B∑i=1

θ∗in.

Bootstrap normal interval is given byθn ± zα/2√√√√ 1

B

B∑i=1

(θ∗in − θ∗n)2

23 / 37


Pivotal interval (Bootstrap basic interval)

Define the pivot Rn = θn − θ. Let H(r) denotes the CDF of Rn.Let rα/2 = H−1(α/2) and r1−α/2 = H−1(1− α/2). Then

P(rα/2 ≤ θn − θ ≤ r1−α/2) = 1− α.

We need to estimate H. This corresponds to (A) on Slide 7. Abootstrap estimator of H is given by

λ∗B,n =1

B

B∑i=1

I(R∗in ≤ r) =: HB,n(r)

where R∗in = θ∗in − θn.

24 / 37


Pivotal interval

The estimators of rα/2 and r1−α/2 are given by

rα/2 = H−1B,n(α/2) = R∗

(αB2 )and r1−α/2 = R∗((1−α2 )B),

whereR∗(i) denotes the i-th order statistics of the sample {R∗in, i = 1, . . . , B}.Thus, the bootstrap basic confidence interval is

[θn −R∗((1−α2 )B), θn −R∗(αB2 )

]

or equivalently[2θn − θ∗((1−α2 )B), 2θn − θ

∗(αB2 )

].

25 / 37


Percentile interval

Assumption: Rn = θn − θ has a symmetric distribution around 0.

Because of the symmetric distribution rα/2 = −r1−α/2. Hence

P (θn + rα/2 ≤ θ ≤ θn + r1−α/2) = 1− α.Plugging in the bootstrap estimator of rα/2 and 1− rα/2, thepercentile interval is given by

[θn +R∗(αB2 )

, θn +R∗((1−α2 )B)]

or equivalently[θ∗

(αB2 ), θ∗((1−α2 )B)].

Homework The assumption can be relaxed as following. There existsan unknown increasing transformation h such that h(θn) − h(θ) hasa symmetric distribution around 0.

26 / 37


The BC (bias-corrected) percentile method

Assumption: there exists an unknown increasing transformation h suchthat h(θn)− h(θ) is (asymptotically) from N(w, 1).w is unknown. First we estimate w.

P (θn ≤ θ) = P (h(θn) ≤ h(θ)) = P (h(θn)− h(θ)− w ≤ −w) = Φ(−w).

Then w = Φ−1(β) = zβ , where β = P (θn ≤ θ). β can be estimated by

1

B

B∑i=1

I(θ∗in ≤ θ).

Thus

w = Φ−1

(1

B

B∑i=1

I(θ∗in ≤ θ)

).

27 / 37


The BC (bias-corrected) percentile methodFrom the normality, we have

P (zα/2 ≤ h(θn)− h(θ)− w ≤ z1−α/2) = 1− α,

equivalently

P (h−1(h(θn)− w + zα/2) ≤ θ ≤ h−1(h(θn)− w − zα/2)) = 1− α.

Denote the lower and upper bounds of θ by θl and θu, respectively. Weneed to estimate the bounds because h is unknown.

PFn(θ∗n ≤ θl) =PFn(h(θ∗n) ≤ h(θl))

=PFn(h(θ∗n) ≤ h(θn)− w + zα/2)

=PFn(h(θ∗n)− h(θn)− w ≤ zα/2 − 2w) Bootstrap world

≈PF (h(θn)− h(θn)− w ≤ zα/2 − 2w) Real world

=Φ(zα/2 − 2w).

28 / 37



We have PFn(θ∗n ≤ θl) ≈ Φ(zα/2 − 2w). This means θl is approximately

the quantile of θ∗n with probability Φ(zα/2−2w). Hence it can be estimated

by the corresponding empirical quantile of θ∗n:

θ∗(BΦ(zα/2−2w)).

In the same manner, we obtain the estimator of θu. The BC bootstrapinterval of θ is given by

[θ∗(BΦ(zα/2−2w)), θ∗(BΦ(z1−α/2−2w))],

where w is previously defined.

29 / 37



What if we only have h(θn)− h(θ) is (asymptotically) from N(w, σ2)?

If σ does not depend on h(θ), then the bias corrected percentilemethod can still be used. Why?

If σ depends on h(θ), then we should use the BCa (acceleratedbias-corrected bootstrap percentile) method. We don’t discussabout BCa method here.

30 / 37


The studentised interval

Consider a studentised pivotal: Rn = θn−θσn

, where σ2n = Var(θn). Let

rα/2 and r1−α/2 be the quantile of Rn. Then,

P(σnrα/2 ≤ θn − θ ≤ σnr1−α/2) = 1− α.

Suppose σn is known or we are able to find a consistent estimator ofσn: σn = σ(X1, . . . , Xn) with a known function σ.

We estimate rα/2 and r1−α/2 by bootstrapping. The Bootstrap-tinterval is given by

[θn − σnR∗((1−α2 )B), θn − σnR∗(αB2 )

],

where R∗(j) denotes the j-th order statistics of the sample

{R∗in =θ∗in−θnσ∗in

, i = 1, . . . , B} and σ∗in = σ(X∗i1, . . . , X∗in).

31 / 37


The studentised interval

If σn is unknown. We estimate it by σB,n =√

1B

∑Bi=1(θ∗in − θ∗n)2; see

for normal intervals.Note the way to estimating the quantile of Rn is different from thatwhen σn is known. We consider

R∗in =θ∗in − θnse∗ni

,

where se∗ni needs to be computed for each bootstrap sample, which mightrequire a second bootstrap within each bootstrap. The obtained CI is,

[θn − σB,nR∗((1−α2 )B), θn − σB,nR∗(αB2 )

],

with R∗(j) the j-th order statistics of {R∗in, i = 1, . . . , n}.

32 / 37


Accuracy

A confidence interval CIα is said being first order accurate ifP(θ ∈ CIα) = α+O(n−1/2), and second order accurate ifP(θ ∈ CIα) = α+O(n−1).

Under regularity conditions: ”when bootstrap works”:normal interval, bootstrap basic interval, percentile interval, and BCinterval are first order accurate. BCa and bootstrap-t are secondorder accurate.

33 / 37


Group Presentation (March 9)Group 8

Consider 31 measurements of polished window strength data for aglass airplane window. In reliability tests, researchers often rely onparametric assumptions to characterize observed lifetimes. Pleaseimplement a composite GoF test to see if a Weibull distribution isappropriate. Use Cramer - Von Mises test. The data are as follows.

18.830, 20.800, 21.657, 23.030, 23.230, 24.050,

24.321, 25.500, 25.520, 25.800, 26.690, 26.770,

26.780, 27.050, 27.670, 29.900, 31.110, 33.200,

33.730, 33.760, 33.890, 34.760, 35.750, 35.910,

36.980, 37.080, 37.090, 39.580, 44.045, 45.290, 45.381

The Weibull distribution function is given by

Fβ,γ(x) =

0 x < 0

1− exp

(−[xγ

]β)x ≥ 0

.

This is a scale-shape family.What is your test statistic?Implement the parametric bootstrap to compute p-value.

34 / 37


Group Presentation (March 9)

Group 9

Why this method is called ’bootstrap’? Please do some literaturereview.In a controlled clinical trial, participants were randomly assigned totwo groups: (i) Aspirin and (ii) Placedo, where the aspirin grouphave been taking 325 mg aspirin every second day. At the end oftrial, the number of participants who suffered from MyocardialInfarction was assessed. The counts were given in the folloing table:

MyoInf No MyoInf Total

Aspirin 104 10933 11037Placebo 189 10845 11037

Risk Ratio (RR) defined as the ratio of proportions of cases (riskes)in two groups, is a popular measure in assessing results in clinicaltrials. From the table

RR =RaRp

=104/11037

189/11037= 0.55.

Construct a bootstrap estimate for the variability of RR.

35 / 37



Group 10

Consider the uniform distribution on [0, θ]. Suppose that X1, . . . , Xnare a random sample from U [0, θ]. Estimate θ with X(n), themaximum of the sample. Using bootstrap to estimate thedistribution of En = n(X(n) − θ). Choose n = 30.What is the limit distribution of En?Implement non-parametric bootstrapping: draw bootstrap samplesfrom Fn. Make a histogram to depict the distribution ofE∗n = n(X∗(n) −X(n)). Is it close to the limit distribution of En?Implement parametric bootstrapping: draw bootstrap samples fromFθ. Make a histogram to depict the distribution ofE∗n = n(X∗(n) −X(n)). Is it close to the limit distribution of En?

36 / 37



Group 11

Suppse that X1, . . . X50 are iid from F . How do you contruct a 95%confidence interval for the mean E(X1)? Consider at least thefollowing methods: CLT, and the bootstrap confidence intervals wediscussed.

5.67 5.04 2.23 2.30 1.32 0.49 0.00 0.11 0.22 1.07 3.90

2.66 0.17 1.01 1.64 3.81 2.01 1.94 0.70 0.01 0.89 0.08

0.67 2.21 1.14 0.51 0.52 0.10 4.44 1.80 0.05 0.06 0.22

0.99 0.21 0.61 1.06 6.56 0.42 1.49 1.10 1.04 3.27 0.73

3.01 5.06 0.36 0.56 1.75 5.87

37 / 37

@let@token lecture 5 and 6: bootstrap - tu...

Documents