04_jackknife.pdf
TRANSCRIPT
![Page 1: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/1.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 1/22
Bootstrap, Jackknife and other resampling methodsPart IV: Jackknife
Rozenn Dahyot
Room 128, Department of StatisticsTrinity College Dublin, Ireland
2005
R. Dahyot (TCD) 453 Modern statistical methods 2005 1 / 22
![Page 2: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/2.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 2/22
Introduction
The bootstrap method is not always the best one. One main reason is thatthe bootstrap samples are generated from F and not from F . Can we findsamples/resamples exactly generated from F ?
If we look for samples of size n, then the answer is no!
If we look for samples of size m (m < n), then we can indeed find(re)samples of size m exactly generated from F simply by looking atdifferent subsets of our original sample x!
Looking at different subsets of our original sample amounts to samplingwithout replacement from observations x 1,
· · ·, x n to get (re)samples (now
called subsamples) of size m. This leads us to subsampling and the jackknife.
R. Dahyot (TCD) 453 Modern statistical methods 2005 2 / 22
![Page 3: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/3.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 3/22
Jackknife
The jackknife has been proposed by Quenouille in mid 1950’s.
In fact, the jackknife predates the bootstrap.
The jackknife (with m = n − 1) is less computer-intensive than thebootstrap.
Jackknife describes a swiss penknife, easy to carry around. By
analogy, Tukey (1958) coined the term in statistics as a generalapproach for testing hypotheses and calculating confidence intervals.
R. Dahyot (TCD) 453 Modern statistical methods 2005 3 / 22
![Page 4: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/4.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 4/22
Jackknife samples
DefinitionThe Jackknife samples are computed by leaving out one observation x i
from x = (x 1, x 2, · · · , x n) at a time:
x(i ) = (x 1, x 2,
· · ·, x i −1, x i +1,
· · ·, x n)
The dimension of the jackknife sample x(i ) is m = n − 1
n different Jackknife samples : {x(i )}i =1···n.
No sampling method needed to compute the n jackknife samples.
Available BOOTSTRAP MATLAB TOOLBOX, by Abdelhak M. Zoubir and D. Robert Iskander,http://www.csp.curtin.edu.au/downloads/bootstrap toolbox.html
R. Dahyot (TCD) 453 Modern statistical methods 2005 4 / 22
![Page 5: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/5.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 5/22
Jackknife replications
DefinitionThe ith jackknife replication θ(i ) of the statistic θ = s (x) is:
θ(i ) = s (x(i )), ∀i = 1, · · · , n
Jackknife replication of the mean
s (x(i )) = 1n−1
j =i x j
=
(nx −x i )
n−1
= x (i )
R. Dahyot (TCD) 453 Modern statistical methods 2005 5 / 22
![Page 6: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/6.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 6/22
Jackknife estimation of the standard error
1 Compute the n jackknife subsamples x(1), · · · , x(n) from x.
2 Evaluate the n jackknife replications θ(i ) = s (x(i )).
3 The jackknife estimate of the standard error is defined by:
se jack =
n − 1
n
n
i =1
(θ(·) − θ(i ))2
1/2
where θ(·) = 1n
ni =1 θ(i ).
R. Dahyot (TCD) 453 Modern statistical methods 2005 6 / 22
![Page 7: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/7.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 7/22
Jackknife estimation of the standard error of the mean
For θ = x , it is easy to show that:
x (i ) = nx −x i n−1
x (
·) = 1
nni =1 x (i ) = x
Therefore: se jack =n
i =1(x i −x )2
(n−1)n
1/2
=
σ
√ nwhere σ is the unbiased variance.
R. Dahyot (TCD) 453 Modern statistical methods 2005 7 / 22
![Page 8: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/8.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 8/22
Jackknife estimation of the standard error
The factor n−1n
is much larger than 1B −1 used in bootstrap.
Intuitively this inflation factor is needed because jackknife deviation(θ(i )
−θ(·))2 tend to be smaller than the bootstrap (θ∗(b )
−θ∗(
·))2
(the jackknife sample is more similar to the original data x than thebootstrap).
In fact, the factor n−1n
is derived by considering the special case θ = x
(somewhat arbitrary convention).
R. Dahyot (TCD) 453 Modern statistical methods 2005 8 / 22
![Page 9: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/9.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 9/22
Comparison of Jackknife and Bootstrap on an example
Example A: θ = x
F (x ) = 0.2 N (µ=1,σ=2) + 0.8 N (µ=6,σ=1) x = (x 1, · · · , x 100).
Bootstrap standard error and bias w.r.t. the number B of bootstrapsamples:
B 10 20 50 100 500 1000 10000 seB 0.1386 0.2188 0.2245 0.2142 0.2248 0.2212 0.2187BiasB 0.0617 -0.0419 0.0274 -0.0087 -0.0025 0.0064 0.0025
Jackknife: se jack = 0.2207 and Bias jack = 0
Using textbook formulas: seF
= σ√ n
= 0.2196 ( σ√ n
= 0.2207).
R. Dahyot (TCD) 453 Modern statistical methods 2005 9 / 22
![Page 10: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/10.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 10/22
Jackknife estimation of the bias
1 Compute the n jackknife subsamples x(1), · · · , x(n) from x.
2 Evaluate the n jackknife replications θ(i ) = s (x(i )).
3 The jackknife estimation of the bias is defined as:
Bias jack = (n − 1)(θ(·) − θ)
where θ(·) =1nn
i =1 θ(i ).
R. Dahyot (TCD) 453 Modern statistical methods 2005 10 / 22
![Page 11: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/11.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 11/22
Jackknife estimation of the bias
Note the inflation factor (n − 1) (compared to the bootstrap biasestimate).
θ = x is unbiased so the correspondence is done considering the
plug-in estimate of the variance σ2 =n
i =1(x i −x )2
n .
The jackknife estimate of the bias for the plug-in estimate of thevariance is then:
Bias jack = −σ2
n
R. Dahyot (TCD) 453 Modern statistical methods 2005 11 / 22
![Page 12: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/12.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 12/22
Histogram of the replications
Example A
3.5 4 4.5 5 5.5 60
20
40
60
80
100
120
3.5 4 4.5 5 5.5 60
5
10
15
20
25
30
35
40
45
50
Figure: Histograms of the bootstrap replications {θ∗(b )}b ∈{1,··· ,B =1000} (left),
and the jackknife replications {θ(i )}i ∈{1,··· ,n=100} (right).
R. Dahyot (TCD) 453 Modern statistical methods 2005 12 / 22
![Page 13: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/13.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 13/22
Histogram of the replications
Example A
3.5 4 4.5 5 5.5 60
20
40
60
80
100
120
3.5 4 4.5 5 5.5 60
2
4
6
8
10
12
14
16
18
Figure: Histograms of the bootstrap replications {θ∗(b )}b ∈{1,··· ,B =1000} (left),
and the inflated jackknife replications {√ n − 1(θ(i ) − θ(·)) + θ(·)}i ∈{1,··· ,n=100}
(right).
R. Dahyot (TCD) 453 Modern statistical methods 2005 13 / 22
![Page 14: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/14.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 14/22
Relationship between jackknife and bootstrap
When n is small, it is easier (faster) to compute the n jackknifereplications.
However the jackknife uses less information (less samples) than thebootstrap.
In fact, the jackknife is an approximation to the bootstrap!
R. Dahyot (TCD) 453 Modern statistical methods 2005 14 / 22
![Page 15: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/15.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 15/22
Relationship between jackknife and bootstrap
Considering a linear statistic :
θ = s (x) = µ + 1nni =1 α(x i )
= µ + 1n
ni =1 αi
Mean θ = x
The mean is linear µ = 0 and α(x i ) = αi = x i , ∀i ∈ {1, ·, n}.
There is no loss of information in using the jackknife to compute thestandard error (compared to the bootstrap) for a linear statistic.Indeed the knowledge of the n jackknife replications {θ(i )}, gives the
value of θ for any bootstrap data set.
For non-linear statistics, the jackknife makes a linear approximation tothe bootstrap for the standard error.
R. Dahyot (TCD) 453 Modern statistical methods 2005 15 / 22
R l h b kk f d b
![Page 16: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/16.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 16/22
Relationship between jackknife and bootstrap
Considering a quadratic statistic
θ = s (x) = µ + 1n
ni =1 α(x i ) + 1
n2β (x i , x j )
Variance θ = σ2
σ2 = 1n
ni =1(x i − x )2 is a quadratic statistic.
Again the knowledge of the n jackknife replications
{s (θ(i ))
}, gives
the value of θ for any bootstrap data set. The jackknife andbootstrap estimates of the bias agree for quadratic statistics.
R. Dahyot (TCD) 453 Modern statistical methods 2005 16 / 22
R l i hi b j kk if d b
![Page 17: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/17.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 17/22
Relationship between jackknife and bootstrap
The Law school example: θ = corr(x, y).
The correlation is a non linear statistic.
From B=3200 bootstrap replications, seB =3200 = 0.132.
From n = 15 jackknife replications, se jack = 0.1425.
Textbook formula: seF
= (1
− corr2)/
√ n
−3 = 0.1147
R. Dahyot (TCD) 453 Modern statistical methods 2005 17 / 22
F il f h j kk if
![Page 18: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/18.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 18/22
Failure of the jackknife
The jackknife can fail if the estimate θ is not smooth (i.e. a small change
in the data can cause a large change in the statistic). A simplenon-smooth statistic is the median.
On the mouse data
Compute the jackknife replications of the median
xCont = (10, 27, 31, 40, 46, 50, 52, 104, 146) (Control group data).You should find 48,48,48,48,45,43,43,43,43 a.
Three different values appears as a consequence of a lack of
smoothness of the medianb
.aThe median of an even number of data points is the average of the middle 2 values.b the median is not a differentiable function of x .
R. Dahyot (TCD) 453 Modern statistical methods 2005 18 / 22
D l t d J kk if l
![Page 19: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/19.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 19/22
Delete-d Jackknife samples
Definition
The delete-d Jackknife subsamples are computed by leaving out d
observations from x at a time.
The dimension of the subsample is n − d .
The number of possible subsamples now rises
n
d
= n!
d !(n−d )! .
Choice:√ n < d < n
−1
R. Dahyot (TCD) 453 Modern statistical methods 2005 19 / 22
D l t d j kk if
![Page 20: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/20.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 20/22
Delete-d jackknife
1 Compute all n
d d-jackknife subsamples x(1),
· · ·, x(n) from x.
2 Evaluate the jackknife replications θ(i ) = s (x(i )).
3 Estimation of the standard error (when n = r · d ):
sed − jack =
r
n
d i
(θ(i ) − θ(·))2
1/2
where θ(·) =
i θ(i ) n
d
.
R. Dahyot (TCD) 453 Modern statistical methods 2005 20 / 22
Concl ding remarks
![Page 21: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/21.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 21/22
Concluding remarks
The inconsistency of the jackknife subsamples with non-smoothstatistics can be fixed using delete-d jackknife subsamples.
The subsamples (jackknife or delete-d jackknife) are actually samples(of smaller size) from the true distribution F whereas resamples(bootstrap) are samples from F .
R. Dahyot (TCD) 453 Modern statistical methods 2005 21 / 22
Summary
![Page 22: 04_Jackknife.pdf](https://reader030.vdocuments.us/reader030/viewer/2022021217/577cdd261a28ab9e78ac4faf/html5/thumbnails/22.jpg)
7/30/2019 04_Jackknife.pdf
http://slidepdf.com/reader/full/04jackknifepdf 22/22
Summary
Bias and standard error estimates have been introduced using jackknife replications.
The Jackknife standard error estimate is a linear approximation of thebootstrap standard error.
The Jackknife bias estimate is a quadratic approximation of thebootstrap bias.
Using smaller subsamples (delete-d jackknife) can improve fornon-smooth statistics such as the median.
R. Dahyot (TCD) 453 Modern statistical methods 2005 22 / 22