conditional predictive inference post model selection...predictive inference post model selection in...
TRANSCRIPT
![Page 1: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/1.jpg)
Conditional Predictive Inference Post ModelSelection
Hannes Leeb
Department of StatisticsYale University
Model Selection Workshop, Vienna, July 25, 2008
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 2: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/2.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Problem statment
Predictive inference post model selection in setting with largedimension and (comparatively) small sample size.
Example: Stenbakken & Souders (1987, 1991): Predict performanceof D/A converters. Select 64 explanatory variables from a total of8,192 based on a sample of size 88.
Features of this example:
Large number of candidate models
Selected model is complex in relation to sample size
Focus on predictive performance and inference, not oncorrectness
Model is selected and fitted to the data once and then usedrepeatedly for prediction
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 3: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/3.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Problem statment
Predictive inference post model selection in setting with largedimension and (comparatively) small sample size.
Example: Stenbakken & Souders (1987, 1991): Predict performanceof D/A converters. Select 64 explanatory variables from a total of8,192 based on a sample of size 88.
Features of this example:
Large number of candidate models
Selected model is complex in relation to sample size
Focus on predictive performance and inference, not oncorrectness
Model is selected and fitted to the data once and then usedrepeatedly for prediction
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 4: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/4.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Problem statment
Predictive inference post model selection in setting with largedimension and (comparatively) small sample size.
Problem studied here:
Given a training sample of size n and a collection M of candidatemodels, find a ‘good’ model m ∈M and conduct predictiveinference based on selected model, conditional on the trainingsample. Features:
#M� n, i.e., potentially many candidate models
|m| ∼ n, i.e., potentially complex candidate models
no strong regularity conditions
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 5: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/5.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Overview of results
We consider a model selector and a prediction interval post modelselection (that are based on a variant of generalizedcross-validation) in linear regression with random design.
For Gaussian data we show:
The prediction interval is ‘approximately valid and short’conditional on the training sample, except on an event whoseprobability is less than
C1 #M exp[− C2(n− |M|)
],
where #M denotes the number of candidate models, and |M|denotes the number of parameters in the most complex candidatemodel.This finite-sample result holds uniformly over all data-generatingprocesses that we consider.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 6: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/6.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Overview of results
We consider a model selector and a prediction interval post modelselection (that are based on a variant of generalizedcross-validation) in linear regression with random design.
For Gaussian data we show:
The prediction interval is ‘approximately valid and short’conditional on the training sample, except on an event whoseprobability is less than
C1 #M exp[− C2(n− |M|)
],
where #M denotes the number of candidate models, and |M|denotes the number of parameters in the most complex candidatemodel.This finite-sample result holds uniformly over all data-generatingprocesses that we consider.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 7: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/7.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Overview of results
We consider a model selector and a prediction interval post modelselection (that are based on a variant of generalizedcross-validation) in linear regression with random design.
For Gaussian data we show:
The prediction interval is ‘approximately valid and short’conditional on the training sample, except on an event whoseprobability is less than
C1 #M exp[− C2(n− |M|)
],
where #M denotes the number of candidate models, and |M|denotes the number of parameters in the most complex candidatemodel.This finite-sample result holds uniformly over all data-generatingprocesses that we consider.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 8: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/8.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
The data-generating process
Gaussian linear model with random design
Consider a response y that is related to a (possibly infinite)number of explanatory variables xj , j ≥ 1, by
y =∞∑
j=1
xjθj + u (1)
with x1 = 1. Assume that u has mean zero and is uncorrelatedwith the xj ’s. Moreover, assume that the xj ’s for j > 1 and u arejointly non-degenerate Gaussian, such that the sum converges inL2.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 9: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/9.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
The data-generating process
Gaussian linear model with random design
Consider a response y that is related to a (possibly infinite)number of explanatory variables xj , j ≥ 1, by
y =∞∑
j=1
xjθj + u (1)
with x1 = 1. Assume that u has mean zero and is uncorrelatedwith the xj ’s. Moreover, assume that the xj ’s for j > 1 and u arejointly non-degenerate Gaussian, such that the sum converges inL2.
The unknown parameters here are θ, the variance of u, as well asthe means and the variance/covariance structure of the xj ’s.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 10: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/10.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
The data-generating process
Gaussian linear model with random design
Consider a response y that is related to a (possibly infinite)number of explanatory variables xj , j ≥ 1, by
y =∞∑
j=1
xjθj + u (1)
with x1 = 1. Assume that u has mean zero and is uncorrelatedwith the xj ’s. Moreover, assume that the xj ’s for j > 1 and u arejointly non-degenerate Gaussian, such that the sum converges inL2.
No further regularity conditions are imposed.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 11: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/11.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
The candidate models and predictors
The candidate models and predictors
Consider a sample (X, Y ) of n independent realizations of (x, y) asin (1), and a collection M of candidate models. Each modelm ∈M is assumed to satisfy |m| < n− 1. Each model m is fit tothe data by least-squares. Given a new set of explanatory variablesx(f), the corresponding response y(f) is predicted by
y(f)(m) =∞∑
j=1
x(f)j θj(m)
when using model m. Here, x(f), y(f) is another independentrealization from (1), and θ(m) is the restricted least-squaresestimator corresponding to m.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 12: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/12.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Two goals
(i) Select a ‘good’ model from M for prediction out-of-sample,and (ii) conduct predictive inference based on the selected model,both conditional on the training sample.
Two Quantities of Interest
For m ∈M, let ρ2(m) denote the conditional mean-squared errorof the predictor y(f)(m) given the training sample, i.e.,
ρ2(m) = E
[ (y(f) − y(f)(m)
)2∣∣∣∣∣∣∣∣X, Y
].
For m ∈M, the conditional distribution of the prediction errory(f)(m)− y(f) given the training sample is
y(f)(m)− y(f)∣∣∣∣∣∣ X, Y ∼ N(ν(m), δ2(m)) ≡ L(m).
Note that ρ2(m) = ν2(m) + δ2(m).Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 13: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/13.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Two goals
(i) Select a ‘good’ model from M for prediction out-of-sample,and (ii) conduct predictive inference based on the selected model,both conditional on the training sample.
Two Quantities of Interest
For m ∈M, let ρ2(m) denote the conditional mean-squared errorof the predictor y(f)(m) given the training sample, i.e.,
ρ2(m) = E
[ (y(f) − y(f)(m)
)2∣∣∣∣∣∣∣∣X, Y
].
For m ∈M, the conditional distribution of the prediction errory(f)(m)− y(f) given the training sample is
y(f)(m)− y(f)∣∣∣∣∣∣ X, Y ∼ N(ν(m), δ2(m)) ≡ L(m).
Note that ρ2(m) = ν2(m) + δ2(m).Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 14: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/14.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Two goals
(i) Select a ‘good’ model from M for prediction out-of-sample,and (ii) conduct predictive inference based on the selected model,both conditional on the training sample.
Two Quantities of Interest
For m ∈M, let ρ2(m) denote the conditional mean-squared errorof the predictor y(f)(m) given the training sample, i.e.,
ρ2(m) = E
[ (y(f) − y(f)(m)
)2∣∣∣∣∣∣∣∣X, Y
].
For m ∈M, the conditional distribution of the prediction errory(f)(m)− y(f) given the training sample is
y(f)(m)− y(f)∣∣∣∣∣∣ X, Y ∼ N(ν(m), δ2(m)) ≡ L(m).
Note that ρ2(m) = ν2(m) + δ2(m).Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 15: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/15.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Two goals
(i) Select a ‘good’ model from M for prediction out-of-sample,and (ii) conduct predictive inference based on the selected model,both conditional on the training sample.
Two Quantities of Interest
For m ∈M, let ρ2(m) denote the conditional mean-squared errorof the predictor y(f)(m) given the training sample, i.e.,
ρ2(m) = E
[ (y(f) − y(f)(m)
)2∣∣∣∣∣∣∣∣X, Y
].
For m ∈M, the conditional distribution of the prediction errory(f)(m)− y(f) given the training sample is
y(f)(m)− y(f)∣∣∣∣∣∣ X, Y ∼ N(ν(m), δ2(m)) ≡ L(m).
Note that ρ2(m) = ν2(m) + δ2(m).Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 16: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/16.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
A useful observation
Write σ2(m) for the conditional variance of the response giventhose explanatory variables that are included in model m, i.e.,
σ2(m) = Var[y || xj included in model m, j ≥ 1].
Lemma
δ2(m) ∼ σ2(m)
(1 +
χ2|m|−1
χ2n−|m|+1
),
where the χ2-random variables are independent. Similarly,
ν2(m) ∼ 1n
σ2(m)
(1 +
χ2|m|−1
χ2n−|m|+1
),
and σ2(m) = RSS(m)/(n− |m|) ∼ σ2(m)χ2n−|m|/(n− |m|).
[The Lemma extends Theorem 1.3 of Breiman & Friedman (1983).]Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 17: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/17.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
A useful observation
Write σ2(m) for the conditional variance of the response giventhose explanatory variables that are included in model m, i.e.,
σ2(m) = Var[y || xj included in model m, j ≥ 1].
Lemma
δ2(m) ∼ σ2(m)
(1 +
χ2|m|−1
χ2n−|m|+1
),
where the χ2-random variables are independent. Similarly,
ν2(m) ∼ 1n
σ2(m)
(1 +
χ2|m|−1
χ2n−|m|+1
),
and σ2(m) = RSS(m)/(n− |m|) ∼ σ2(m)χ2n−|m|/(n− |m|).
[The Lemma extends Theorem 1.3 of Breiman & Friedman (1983).]Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 18: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/18.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Estimators for ρ2(m)
Note that
E[ρ2(m)
]= σ2(m)
n− 2n− 1− |m|
(1 +
1n
).
The Sp criterion (Tukey, 1967):
Sp(m) = σ2(m)n− 2
n− 1− |m|.
The GCV-criterion (Craven & Wahba, 1978):
GCV(m) = σ2(m)n
n− |m|.
An auxiliary criterion:
ρ2(m) = σ2(m)n
n + 1− |m|.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 19: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/19.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Estimators for ρ2(m)
Note that
E[ρ2(m)
]= σ2(m)
n− 2n− 1− |m|
(1 +
1n
).
The Sp criterion (Tukey, 1967):
Sp(m) = σ2(m)n− 2
n− 1− |m|.
The GCV-criterion (Craven & Wahba, 1978):
GCV(m) = σ2(m)n
n− |m|.
An auxiliary criterion:
ρ2(m) = σ2(m)n
n + 1− |m|.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 20: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/20.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Estimators for ρ2(m)
Note that
E[ρ2(m)
]= σ2(m)
n− 2n− 1− |m|
(1 +
1n
).
The Sp criterion (Tukey, 1967):
Sp(m) = σ2(m)n− 2
n− 1− |m|.
The GCV-criterion (Craven & Wahba, 1978):
GCV(m) = σ2(m)n
n− |m|.
An auxiliary criterion:
ρ2(m) = σ2(m)n
n + 1− |m|.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 21: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/21.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Estimators for ρ2(m)
Note that
E[ρ2(m)
]= σ2(m)
n− 2n− 1− |m|
(1 +
1n
).
The Sp criterion (Tukey, 1967):
Sp(m) = σ2(m)n− 2
n− 1− |m|.
The GCV-criterion (Craven & Wahba, 1978):
GCV(m) = σ2(m)n
n− |m|.
An auxiliary criterion:
ρ2(m) = σ2(m)n
n + 1− |m|.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 22: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/22.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Performance of ρ2(m)
Want: ρ2(m)/ρ2(m) ≈ 1 or, equivalently, log ρ2(m)/ρ2(m) ≈ 0with high probability.
Theorem
For each ε > 0, we have
P
(∣∣∣∣logρ2(m)ρ2(m)
∣∣∣∣ > ε
)≤ 6 exp
[−n− |m|
8ε2
ε + 8
],
for each sample size n and uniformly over all data-generatingprocesses as in (1).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 23: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/23.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Performance of ρ2(m)
Want: ρ2(m)/ρ2(m) ≈ 1 or, equivalently, log ρ2(m)/ρ2(m) ≈ 0with high probability.
Theorem
For each ε > 0, we have
P
(∣∣∣∣logρ2(m)ρ2(m)
∣∣∣∣ > ε
)≤ 6 exp
[−n− |m|
8ε2
ε + 8
],
for each sample size n and uniformly over all data-generatingprocesses as in (1).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 24: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/24.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Performance of ρ2(m)
Want: ρ2(m)/ρ2(m) ≈ 1 or, equivalently, log ρ2(m)/ρ2(m) ≈ 0with high probability.
Theorem
For each ε > 0, we have
P
(∣∣∣∣logρ2(m)ρ2(m)
∣∣∣∣ > ε
)≤ 6 exp
[−n− |m|
8ε2
ε + 8
],
for each sample size n and uniformly over all data-generatingprocesses as in (1).
A similar result holds for the absolute difference |ρ2(m) − ρ2(m)|,uniformly over all data-generating processes with bounded vari-ance, i.e., where Var[y] ≤ s2 (with an upper bound of the formC1 exp[−(n− |m|) C(ε, s2)]; here s2 is a fixed constant).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 25: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/25.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Performance of ρ2(m)
Want: ρ2(m)/ρ2(m) ≈ 1 or, equivalently, log ρ2(m)/ρ2(m) ≈ 0with high probability.
Theorem
For each ε > 0, we have
P
(∣∣∣∣logρ2(m)ρ2(m)
∣∣∣∣ > ε
)≤ 6 exp
[−n− |m|
8ε2
ε + 8
],
for each sample size n and uniformly over all data-generatingprocesses as in (1).
Method of proof: Chernoff’s method or variations thereof (Gaussiancase); Marcenko-Pastur law (non-Gaussian case).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 26: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/26.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Selecting the empirically best model
Write m∗ and m for the truly best and the empirically bestcandidate model, i.e.,
m∗ = argminMρ2(m) and m = argminMρ2(m).
Moreover, write |M| for the number of parameters in the mostcomplex candidate model.
Corollary
For each fixed sample size n and uniformly over all data-generatingprocesses as in (1), we have
P
(log
ρ2(m)ρ2(m∗)
> ε
)≤ 6 exp
[log #M− n− |M|
16ε2
ε + 16
],
P
(∣∣∣∣logρ2(m)ρ2(m)
∣∣∣∣ > ε
)≤ 6 exp
[log #M− n− |M|
8ε2
ε + 8
],
for each ε > 0.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 27: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/27.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Selecting the empirically best model
Write m∗ and m for the truly best and the empirically bestcandidate model, i.e.,
m∗ = argminMρ2(m) and m = argminMρ2(m).
Moreover, write |M| for the number of parameters in the mostcomplex candidate model.
Corollary
For each fixed sample size n and uniformly over all data-generatingprocesses as in (1), we have
P
(log
ρ2(m)ρ2(m∗)
> ε
)≤ 6 exp
[log #M− n− |M|
16ε2
ε + 16
],
P
(∣∣∣∣logρ2(m)ρ2(m)
∣∣∣∣ > ε
)≤ 6 exp
[log #M− n− |M|
8ε2
ε + 8
],
for each ε > 0.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 28: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/28.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Other model selectors
Consider AIC (Akaike, 1969), AICc (Hurvich & Tsai, 1989), FPE(Akaike, 1970), and BIC (Schwarz, 1978). Taking the exponentialof the objective functions of AIC, AICc and BIC, and using the factthat GCV(m) = 1
nRSS(m)/(1− |m|/n)2 ≈ ρ(m), we get
AIC(m) =1n
RSS(m)e2|m|n ≈ ρ(m)e2
|m|n (1− |m|/n)2
AICc(m) =1n
RSS(m)e2|m|−1
n−|m|−2 ≈ ρ(m)e2|m|−1
n−|m|−2 (1− |m|/n)2
FPE(m) =1n
RSS(m)1 + |m|/n
1− |m|/n≈ ρ(m)(1 + |m|/n)(1− |m|/n)
BIC(m) =1n
RSS(m)elog(n)|m|n ≈ ρ(m)n|m|/n(1− |m|/n)2.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 29: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/29.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Other model selectors
Consider AIC (Akaike, 1969), AICc (Hurvich & Tsai, 1989), FPE(Akaike, 1970), and BIC (Schwarz, 1978). Taking the exponentialof the objective functions of AIC, AICc and BIC, and using the factthat GCV(m) = 1
nRSS(m)/(1− |m|/n)2 ≈ ρ(m), we get
AIC(m) =1n
RSS(m)e2|m|n ≈ ρ(m)e2
|m|n (1− |m|/n)2
AICc(m) =1n
RSS(m)e2|m|−1
n−|m|−2 ≈ ρ(m)e2|m|−1
n−|m|−2 (1− |m|/n)2
FPE(m) =1n
RSS(m)1 + |m|/n
1− |m|/n≈ ρ(m)(1 + |m|/n)(1− |m|/n)
BIC(m) =1n
RSS(m)elog(n)|m|n ≈ ρ(m)n|m|/n(1− |m|/n)2.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 30: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/30.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Other model selectors
Consider AIC (Akaike, 1969), AICc (Hurvich & Tsai, 1989), FPE(Akaike, 1970), and BIC (Schwarz, 1978). Taking the exponentialof the objective functions of AIC, AICc and BIC, and using the factthat GCV(m) = 1
nRSS(m)/(1− |m|/n)2 ≈ ρ(m), we get
AIC(m) =1n
RSS(m)e2|m|n ≈ ρ(m)e2
|m|n (1− |m|/n)2
AICc(m) =1n
RSS(m)e2|m|−1
n−|m|−2 ≈ ρ(m)e2|m|−1
n−|m|−2 (1− |m|/n)2
FPE(m) =1n
RSS(m)1 + |m|/n
1− |m|/n≈ ρ(m)(1 + |m|/n)(1− |m|/n)
BIC(m) =1n
RSS(m)elog(n)|m|n ≈ ρ(m)n|m|/n(1− |m|/n)2.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 31: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/31.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Other model selectors
Consider AIC (Akaike, 1969), AICc (Hurvich & Tsai, 1989), FPE(Akaike, 1970), and BIC (Schwarz, 1978). Taking the exponentialof the objective functions of AIC, AICc and BIC, and using the factthat GCV(m) = 1
nRSS(m)/(1− |m|/n)2 ≈ ρ(m), we get
AIC(m) =1n
RSS(m)e2|m|n ≈ ρ(m)e2
|m|n (1− |m|/n)2
AICc(m) =1n
RSS(m)e2|m|−1
n−|m|−2 ≈ ρ(m)e2|m|−1
n−|m|−2 (1− |m|/n)2
FPE(m) =1n
RSS(m)1 + |m|/n
1− |m|/n≈ ρ(m)(1 + |m|/n)(1− |m|/n)
BIC(m) =1n
RSS(m)elog(n)|m|n ≈ ρ(m)n|m|/n(1− |m|/n)2.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 32: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/32.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Other model selectors
Consider AIC (Akaike, 1969), AICc (Hurvich & Tsai, 1989), FPE(Akaike, 1970), and BIC (Schwarz, 1978). Taking the exponentialof the objective functions of AIC, AICc and BIC, and using the factthat GCV(m) = 1
nRSS(m)/(1− |m|/n)2 ≈ ρ(m), we get
AIC(m) =1n
RSS(m)e2|m|n ≈ ρ(m)e2
|m|n (1− |m|/n)2
AICc(m) =1n
RSS(m)e2|m|−1
n−|m|−2 ≈ ρ(m)e2|m|−1
n−|m|−2 (1− |m|/n)2
FPE(m) =1n
RSS(m)1 + |m|/n
1− |m|/n≈ ρ(m)(1 + |m|/n)(1− |m|/n)
BIC(m) =1n
RSS(m)elog(n)|m|n ≈ ρ(m)n|m|/n(1− |m|/n)2.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 33: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/33.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Other model selectors
Consider AIC (Akaike, 1969), AICc (Hurvich & Tsai, 1989), FPE(Akaike, 1970), and BIC (Schwarz, 1978). Taking the exponentialof the objective functions of AIC, AICc and BIC, and using the factthat GCV(m) = 1
nRSS(m)/(1− |m|/n)2 ≈ ρ(m), we get
AIC(m) =1n
RSS(m)e2|m|n ≈ ρ(m)e2
|m|n (1− |m|/n)2
AICc(m) =1n
RSS(m)e2|m|−1
n−|m|−2 ≈ ρ(m)e2|m|−1
n−|m|−2 (1− |m|/n)2
FPE(m) =1n
RSS(m)1 + |m|/n
1− |m|/n≈ ρ(m)(1 + |m|/n)(1− |m|/n)
BIC(m) =1n
RSS(m)elog(n)|m|n ≈ ρ(m)n|m|/n(1− |m|/n)2.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 34: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/34.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Consider one sample of size n = 1300 from (1) with E[xj ] = 0,
E[xixj ] = δi,j , and E[u2] = 1.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 35: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/35.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Consider one sample of size n = 1300 from (1) with E[xj ] = 0,
E[xixj ] = δi,j , and E[u2] = 1.The first 1000 components of θ are shown (in absolute value) below, theremaining components are zero:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 36: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/36.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Consider one sample of size n = 1300 from (1) with E[xj ] = 0,
E[xixj ] = δi,j , and E[u2] = 1.The first 1000 components of θ are shown (in absolute value) below, theremaining components are zero:
The non-zero coefficients of θ are ‘sparse:’ Most are small, but there area few groups of adjacent large coefficients.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 37: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/37.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Consider one sample of size n = 1300 from (1) with E[xj ] = 0,
E[xixj ] = δi,j , and E[u2] = 1.The first 1000 components of θ are shown (in absolute value) below, theremaining components are zero:
Choose candidate models that can pick-out the few important groups:Divide the first 1000 coefficients of θ into 20 consecutive blocks of equallength and consider all candidate models that include or exclude one blockat a time, resulting in 220 candidate models.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 38: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/38.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Consider one sample of size n = 1300 from (1) with E[xj ] = 0,
E[xixj ] = δi,j , and E[u2] = 1.The first 1000 components of θ are shown (in absolute value) below, theremaining components are zero:
Model space is searched using a general-to-specific greedy strategy.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 39: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/39.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 40: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/40.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 41: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/41.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 42: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/42.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 43: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/43.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 44: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/44.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 45: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/45.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 46: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/46.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 2:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 47: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/47.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 3:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 48: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/48.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Gaussian, u Gaussian:
Run 4:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 49: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/49.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Exponential, u Bernoulli (scaled and centered).
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 50: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/50.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario I
Results for X Bernoulli, u Exponential (scaled and centered).
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 51: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/51.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario II
Consider the same setting as in Scenario I, but instead of aparameter θ that is ‘sparse,’ consider a case where none of thecandidate models fits particularly well.
The first 1000 components of θ are shown (in absolute value)below, the remaining components are zero:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 52: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/52.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario II
Results for X Gaussian, u Gaussian:
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 53: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/53.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario II
Results for X Exponential, u Bernoulli (scaled and centered).
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 54: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/54.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Simulation Scenario II
Results for X Bernoulli, u Exponential (scaled and centered).
Run 1:
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 55: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/55.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Predictive Inference based on model m
Idea: Estimate the conditional distribution of the prediction error,i.e., L(m) ≡ N(ν(m), δ2(m)), by
L(m) ≡ N(0, δ2(m)),
where δ2(m) is defined as ρ2(m) before.
Theorem
For each fixed sample size n and uniformly over all data-generatingprocesses as in (1), we have
P
(∣∣∣∣∣∣L(m)− L(m)∣∣∣∣∣∣
TV>
1√n
+ ε
)≤ 7 exp
[−n− |m|
2ε2
ε + 2
]for each ε ≤ log(2) ≈ 0.69.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 56: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/56.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Predictive Inference based on model m
Idea: Estimate the conditional distribution of the prediction error,i.e., L(m) ≡ N(ν(m), δ2(m)), by
L(m) ≡ N(0, δ2(m)),
where δ2(m) is defined as ρ2(m) before.
Theorem
For each fixed sample size n and uniformly over all data-generatingprocesses as in (1), we have
P
(∣∣∣∣∣∣L(m)− L(m)∣∣∣∣∣∣
TV>
1√n
+ ε
)≤ 7 exp
[−n− |m|
2ε2
ε + 2
]for each ε ≤ log(2) ≈ 0.69.
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 57: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/57.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Prediction intervals post model selection
Recall that y(f)(m)− y(f) || X, Y ∼ N(ν(m), δ2(m)) ≡ L(m) foreach m ∈M. Based on model m, the ‘prediction interval’
y(f)(m)− ν(m) ± qα/2δ(m)
has coverage probability 1− α conditional on the training sampleX, Y , but is infeasible.
In terms of width of this interval, the ‘best’ model is one thatminimizes δ(m). Set
m◦ = argminMδ2(m).
For fixed m ∈M, a feasible prediction interval is
I(m) : y(f)(m) ± qα/2δ(m).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 58: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/58.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Prediction intervals post model selection
Recall that y(f)(m)− y(f) || X, Y ∼ N(ν(m), δ2(m)) ≡ L(m) foreach m ∈M. Based on model m, the ‘prediction interval’
y(f)(m)− ν(m) ± qα/2δ(m)
has coverage probability 1− α conditional on the training sampleX, Y , but is infeasible.
In terms of width of this interval, the ‘best’ model is one thatminimizes δ(m). Set
m◦ = argminMδ2(m).
For fixed m ∈M, a feasible prediction interval is
I(m) : y(f)(m) ± qα/2δ(m).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 59: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/59.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Prediction intervals post model selection
Recall that y(f)(m)− y(f) || X, Y ∼ N(ν(m), δ2(m)) ≡ L(m) foreach m ∈M. Based on model m, the ‘prediction interval’
y(f)(m)− ν(m) ± qα/2δ(m)
has coverage probability 1− α conditional on the training sampleX, Y , but is infeasible.
In terms of width of this interval, the ‘best’ model is one thatminimizes δ(m). Set
m◦ = argminMδ2(m).
For fixed m ∈M, a feasible prediction interval is
I(m) : y(f)(m) ± qα/2δ(m).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 60: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/60.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Prediction interval is approx. valid & adaptive
Proposition
For each ε ≤ log 2 and each fixed sample size n, we have∣∣∣∣∣(1− α)− P
(y(f) ∈ I(m)
∣∣∣∣∣∣ Y, X)∣∣∣∣∣ ≤ 1√
n+ ε
and ∣∣∣∣∣logδ(m)δ(m◦)
∣∣∣∣∣ ≤ ε,
except on an event whose probability is not larger than
11 exp[log #M− n− |M|
2ε2
ε + 2
],
uniformly over all data-generating processes as in (1).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 61: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/61.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Conclusion
Caution:
The ‘large p / small n’ behavior of model selectors can bemarkedly different from their properties for ‘small p / large n’.
Proof of concept: The two goals are achievable
In ‘large p / small n’ settings and under minimal assumptions,good models can be found, and the resulting prediction intervalspost model selection are approximately valid and adaptive (in finitesamples with high probability uniformly over all data-generatingprocesses considered).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 62: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/62.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Conclusion
Caution:
The ‘large p / small n’ behavior of model selectors can bemarkedly different from their properties for ‘small p / large n’.
Proof of concept: The two goals are achievable
In ‘large p / small n’ settings and under minimal assumptions,good models can be found, and the resulting prediction intervalspost model selection are approximately valid and adaptive (in finitesamples with high probability uniformly over all data-generatingprocesses considered).
Hannes Leeb Conditional Predictive Inference Post Model Selection
![Page 63: Conditional Predictive Inference Post Model Selection...Predictive inference post model selection in setting with large dimension and (comparatively) small sample size. Problem studied](https://reader030.vdocuments.us/reader030/viewer/2022041116/5f286f074133384a7669a1fa/html5/thumbnails/63.jpg)
Introduction Setting Goal 1 Goal 2 Conclusion
Conclusion
Caution:
The ‘large p / small n’ behavior of model selectors can bemarkedly different from their properties for ‘small p / large n’.
Proof of concept: The two goals are achievable
In ‘large p / small n’ settings and under minimal assumptions,good models can be found, and the resulting prediction intervalspost model selection are approximately valid and adaptive (in finitesamples with high probability uniformly over all data-generatingprocesses considered).
Hannes Leeb Conditional Predictive Inference Post Model Selection