intro bootstrap

7/29/2019 Intro Bootstrap

1/18

Introduction to the Bootstrap

Machelle D. Wilson


2/18

Outline

Why the Bootstrap?

Limitations of traditional statistics

How Does it Work?

The Empirical Distribution Function and the Plug-inPrinciple

Accuracy of an estimate: Bootstrap standard errorand confidence intervals

Examples

How Good is the Bootstrap?


3/18

Limitations of Traditional Statistics:Problems with distributional assumptions

Often data can not safely be assumed to befrom an identifiable distribution.

Sometimes the distribution of the statistic

is mathematically intractable, evenassuming that distributional assumptionscan be made.

Hence, often the bootstrap provides asuperior alternative to parametricstatistics.


4/18

An example data set

80 100 120 140 160 180

0

50

100

150

200

250

1000 Bootstrapped Means

Mean conc. and Dose rate fixedmean dose

Red Lines=BS CI

Black Lines=Normal CI


5/18

An Example Data Set

50 100 150 200 250 300 350

0

50

100

150

200

250

300

1000 Bootstrapped Means

Mean Conc and Dose Rate Randommean dose

Red Lines=BS CI

Black Lines=Normal CI


6/18

Statistics in the Computer Age

Efron and Tibshirani, 1991 in Science:Most of our familiar statistical methods, such ashypothesis testing, linear regression, analysis of

variance, and maximum likelihood estimation, weredesigned to be implemented on mechanical calculators.Modern electronic computation has encouraged a host ofnew statistical methods that require fewer distributionalassumptions than their predecessors and can be applied

to more complicated statistical estimatorswithout theusual concerns for mathematical tractability.


7/18

The Bootstrap Solution

With the advent of cheap, high powercomputing, it has become relatively easy to useresampling techniques, such as the bootstrap, toestimate the distribution of sample statistics

empirically rather than making distributionalassumptions. The bootstrap resamples the data with equal

probability and with replacement and calculatesthe statistic of interest at each resampling. Theresulting histogram, mean, quantiles andvariance of the bootstrapped statistics providean estimate of its distribution.


8/18

Example

Take the data set 1,2,3. There are 10possible resamplings, where re-orderingsare considered the same sampling.

1,2,3 1,1,2

1,1,3 2,2,1

2,2,3 3,3,1

3,3,2 1,1,1

2,2,2 3,3,3

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30

5

10

15

20

25

30


9/18

The Bootstrap Solution

In general, the number ofbootstrap samples, Cn, is

Table of possible distinct bootstrapre-samplings by sample size.

2 1.

1n

nC

n

n 5 10 12 15 20 25 30

Cn 126 92,378 1.35x104 7.76x105 6.89x1010 6.32x1013 5.91x1016


10/18

The Empirical Distribution Function

Having observed a random sample of sizen from a probability distribution F,

the empirical distribution function (edf),assigns to a setA in the sample space

of x its empirical probability

1 2, ,... nF x x x ,F

# /iF A P A x A n


11/18

Example

A random sample of 100 throws of a dieyields 13 ones, 19 twos, 10 threes, 17 fours,14 fives, and 27 sixes. Hence the edf is

(1) 0.13

(2) 0.19

(3) 0.10

F

F

F

(4) 0.17

(5) 0.14

(6) 0.27

F

F

F


12/18

The Plug-in Principle

It can be shown that is a sufficient statisticfor F.

That is, all the information about F contained

in x is also contained in . The plug-in principle estimates

by

F

F

( )T F

( )T F


13/18

The Plug-in Principle

If the only information about F comes fromthe sample x, then is a minimumvariance unbiased estimator of .

The bootstrap is drawing B samples from theempirical distribution to estimate B statisticsof interest,

Hence, the bootstrap is both sampling froman edf (of the original sample) and generatingan edf (of the statistic).

( )T F

* .


14/18

Graphical Representation of the Bootstrap

x={x1,x2,,xn}

x*1 x*2 x*3 . . x*B

T(x*1) T(x*2) T(x*3) T(x*B)

2

1

[ ( ) ]

( ( ) )1

B

b

b

T x t

se T xB

*

1

1 ( )

B

b

b

t T x T xB


15/18

Bootstrap Standard Error and Confidenceintervals.

The bootstrap estimate of the mean is justthe empirical average of the statistic overall bootstrap samples.

The bootstrap estimate of standard error isjust the empirical standard deviation ofthe bootstrap statistic over all bootstrapsamples.


16/18

Bootstrap Confidence Intervals

The percentile interval: the bootstrapconfidence interval for any statistic issimply the a/2 and 1-a/2 quantiles.

For example, if B=1000, then to constructthe BS confidence interval we rank thestatistics and take the 25th and the 975thvalues.

There are other BS CIs but this is theeasiest and makes the fewest assumptions.


17/18

Example: Bootstrap of the Median

Go to Splus.


18/18

How Good is the Bootstrap?

The bootstrap, in most cases is as good as theempirical distribution function.

The bootstrap is not optimal when there is goodinformation about F that did not come from thedatai.e. prior information or strong, validdistributional assumptions.

The bootstrap does not work well for extremevalues and needs some what difficult

modifications for autocorrelated data such astimes series.

When all our information comes from the sampleitself, we can not do better than the bootstrap.

intro bootstrap

Documents