![Page 1: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/1.jpg)
Robust strategies and model selection
Stefan Van Aelst
Department of Applied Mathematics and Computer ScienceGhent University, Belgium
ERCIM09 - COMISEF/COST Tutorial
![Page 2: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/2.jpg)
Outline
1 Regression model
2 Least squares
3 Manual variable selection approach
4 Automatic variable selection approach
5 Robustness
6 Robust variable selection: sequencing
7 Robust variable selection: segmentation
Robust selection procedures Stefan Van Aelst 2
![Page 3: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/3.jpg)
Regression model
Regression setting
Consider a datasetZn = {(yi, xi1, . . . , xid) = (yi, xi); i = 1, . . . , n} ⊂ Rd+1.
Y is the response variable
X1, . . . ,Xd are the candidate regressors
The corresponding linear model is:
yi = β1xi1 + · · ·+ βdxid + ǫi i = 1, . . . , n
yi = x′iβ + ǫi i = 1, . . . , n
where the errors ǫi are assumed to be iid with E(ǫi) = 0and Var(ǫi) = σ2 > 0.
Estimate the regression coefficients β from the data.
Robust selection procedures Stefan Van Aelst 3
![Page 4: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/4.jpg)
Least squares
Least squares solution
βLS solves minβ
n∑
i=1
(yi − x′iβ
)2
Write
X = (x1, . . . , xn)t
y = (y1, . . . , xn)t
Then, βLS solves minβ
(y − Xβ)t(y − Xβ)
⇒
βLS = (XtX)−1Xty
y = Xβ = X(XtX)−1Xty = Hy
Robust selection procedures Stefan Van Aelst 4
![Page 5: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/5.jpg)
Least squares
Least squares properties
Unbiased estimator: E(βLS) = β
Gauss-Markov theorem: LS has smallest variance amongall unbiased linear estimators of β.
Why do variable selection?
Robust selection procedures Stefan Van Aelst 5
![Page 6: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/6.jpg)
Least squares
Expected prediction error
Assume the true regression function is linear:Y|x = f (x) + ǫ = xtβ + ǫ
Predict the response Y0 at x0: Y0 = xt0β + ǫ0 = f (x0) + ǫ0
Use an estimator of the regression coefficients: β
Estimated prediction: f (x0) = xt0β
Expected prediction error: E[(Y0 − f (x0))
2]
Robust selection procedures Stefan Van Aelst 6
![Page 7: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/7.jpg)
Least squares
Expected prediction error
E[(Y0 − f (x0))2] = E
[(f (x0) + ǫ0 − f (x0))
2]
= σ2 + E[(f (x0)− f (x0))
2]
= σ2 + MSE(f (x0))
σ2: irreducible variance of the new observation y0
MSE(f (x0)) mean squared error of the prediction at x0 bythe estimator f
Robust selection procedures Stefan Van Aelst 7
![Page 8: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/8.jpg)
Least squares
MSE of a prediction
MSE(f (x0)) = E[(f (x0)− f (x0))
2]
= E[[xt
0(β − β)]2]
= E[[xt
0(β − E(β) + E(β)− β)]2]
= bias(f (x0))2 + Var(f (x0))
LS is unbiased ⇒ bias(f (x0)) = 0
LS minimizes Var(f (x0)) (Gauss-Markov)
LS has smallest MSPE among all linear unbiased estimators
Robust selection procedures Stefan Van Aelst 8
![Page 9: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/9.jpg)
Least squares
LS instability
LS becomes unstable with large MSPE if Var(f (x0)) is high.This can happen if
Many noise variables among the candidate regressors
Highly correlated predictors (multicollinearity)
⇒ Improve on least squares MSPE by trading (a little) bias for(a lot of) variance!
Robust selection procedures Stefan Van Aelst 9
![Page 10: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/10.jpg)
Manual variable selection approach
Manual variable selection
Try to determine the set of the most important regressors
Remove the noise regressors from the model
Avoid multicollinearity
Methods
All subsets
Backward elimination
Forward selection
Stepwise selection
→ choose a selection criterion
Robust selection procedures Stefan Van Aelst 10
![Page 11: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/11.jpg)
Manual variable selection approach
Submodels
DatasetZn = {(yi, xi1, . . . , xid) = (yi, xi); i = 1, . . . , n} ⊂ Rd+1.
Let α ⊂ {1, . . . , d} denote the predictors included in asubmodel
The corresponding submodel is:
yi = x′αiβα + ǫαi i = 1, . . . , n.
A selected model is considered a good model if
It is parsimonious
It fits the data well
It yields good predictions for similar data
Robust selection procedures Stefan Van Aelst 11
![Page 12: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/12.jpg)
Manual variable selection approach
Some standard selection criteria
Adjusted R2: A(α) = 1 −RSS(α)/(n − d(α))
RSS(1)/(n − 1)
Mallow’s Cp: C(α) =RSS(α)σ2 − (n − 2d(α))
Final Prediction Error: FPE(α) =RSS(α)σ2 + 2d(α)
AIC: AIC(α) = −2L(α) + 2d(α)
BIC: BIC(α) = −2L(α) + log(n)d(α)
where σ is the residual scale estimate in the "full" model
Robust selection procedures Stefan Van Aelst 12
![Page 13: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/13.jpg)
Manual variable selection approach
Resampling based selection criteria
Consider the (conditional) expected prediction error:
PE(α) = E
[1n
n∑
i=1
(zi − x′αiβα
)2∣∣∣∣∣ y,X
],
Estimates of the PE can be used as selection criterion.
Estimates can be obtained by cross-validation or bootstrap.
A more advanced selection criterion takes both goodness-of-fitand PE into account:
PPE(α) =1n
n∑
i=1
(yi − x′αiβα
)2+f (n) d(α)+E
[1n
n∑
i=1
(zi − x′αiβα
)2∣∣∣∣∣ y,X
]
Robust selection procedures Stefan Van Aelst 13
![Page 14: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/14.jpg)
Automatic variable selection approach
Automatic variable selection
Try to find a stable model that fits the data well
Shrinkage: constrained least squares optimization
Stagewise forward procedures
Methods
Ridge regression
Lasso
Least Angle regression
L2 Boosting
Elastic Net
Robust selection procedures Stefan Van Aelst 14
![Page 15: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/15.jpg)
Automatic variable selection approach
Lasso
Least Absolute Shrinkage and Selection Operator
βlasso = arg minβ
n∑
i=1
yi − β0 −
d∑
j=1
βjxij
2
subject to ‖β‖1 =d∑
j=1
|βj| ≤ t
0 < t < ‖βLS‖1 is a tuning parameter
Robust selection procedures Stefan Van Aelst 15
![Page 16: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/16.jpg)
Automatic variable selection approach
Example: LASSO fits
*
*
*
* * * *
* *
2 4 6 8
−2
02
46
Df
Sta
ndar
dize
d C
oeffi
cien
ts
* *
*
* ** *
* *
* * * * * **
* *
* * * * *
* ** *
* * *
* *
* *
* *
* * * * * * *
**
* * * * * * * * ** * * * *
* *
**
LASSO
63
74
81
Robust selection procedures Stefan Van Aelst 16
![Page 17: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/17.jpg)
Automatic variable selection approach
Least angle regression
Standardize the variables.
1 Select x1 such that |cor(y, x1)| = maxj |cor(y, xj)|.
2 Put r = y − γx1 where γ is determined such that
|cor(r, x1)| = maxj6=1
|cor(r, xj)|.
3 Select x2 corresponding to the maximum above.Determine the equiangular direction b such that x′1b = x′2b
4 Put r = r − γb where γ is determined such that
|cor(r, x1)| = |cor(r, x2)| = maxj6=1,2
|cor(r, xj)|.
5 Continue the procedure . . .
Robust selection procedures Stefan Van Aelst 17
![Page 18: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/18.jpg)
Automatic variable selection approach
Properties of LAR
Least angle regression (LAR) selects the predictors inorder of importance.
LAR changes the contributions of the predictors graduallyas they are needed.
LAR is very similar to LASSO and can easily be adjustedto produce the LASSO solution
LAR only uses the means, variances and correlations ofthe variables.
LAR is computationally as efficient as LS
Robust selection procedures Stefan Van Aelst 18
![Page 19: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/19.jpg)
Automatic variable selection approach
Example: LAR fits
*
**
* * * *
* *
2 4 6 8
−0.
20.
00.
20.
40.
6
Df
Sta
ndar
dize
d C
oeffi
cien
ts
* *
*
* *
* ** *
* * * * * **
* *
* * * * *
* ** *
* * *
* *
* *
* *
* * * * * * *
**
* * * * * * * * ** * * * *
* *
**
LAR
63
74
21
Robust selection procedures Stefan Van Aelst 19
![Page 20: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/20.jpg)
Automatic variable selection approach
L2 boosting
Standardize the variables.
1 Put r = y and F0 = 0
2 Select x1 such that |cor(r, x1)| = maxj |cor(r, xj)|.
3 Update r = y − ν f(x1) where 0 < ν ≤ 1 is the step lengthand f(x1) are the fitted values from the LS regression of yon x1.Similarly, update F1 = F0 + ν f(x1)
4 Continue the procedure . . .
Robust selection procedures Stefan Van Aelst 20
![Page 21: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/21.jpg)
Automatic variable selection approach
Sequencing variables
Several selection algorithms sequence the predictors in "orderof importance" or screen out the most relevant variables
Forward/stepwise selection
Stagewise forward selection
Penalty methods
Least angle regression
L2 boosting
These methods are computationally very efficient because theyare only based on means, variances and correlations.
Robust selection procedures Stefan Van Aelst 21
![Page 22: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/22.jpg)
Robustness
Robustness: Data with outliers
Question: Number of partners men and women desire tohave in the next 30 years?
Men: Mean=64.3, Median=1−→ Mean is sensitive to outliers−→ Median is robust and thus more reliable
Robust selection procedures Stefan Van Aelst 22
![Page 23: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/23.jpg)
Robustness
Least squares regression
3.6 3.8 4.0 4.2 4.4 4.6
4.0
4.5
5.0
5.5
Log Surface Temperature
Log
Ligh
t Int
ensi
ty
LS
LS: Minimize∑
r2i (β)
Robust selection procedures Stefan Van Aelst 23
![Page 24: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/24.jpg)
Robustness
Outliers
3.6 3.8 4.0 4.2 4.4 4.6
4.0
4.5
5.0
5.5
6.0
Log Surface Temperature
Log
Ligh
t Int
ensi
ty LS
Outliers attract LS!
Robust selection procedures Stefan Van Aelst 24
![Page 25: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/25.jpg)
Robustness
Robust regression estimators
3.6 3.8 4.0 4.2 4.4 4.6
4.0
4.5
5.0
5.5
6.0
Log Surface Temperature
Log
Ligh
t Int
ensi
ty LS
MM
Robust MM estimator is less influenced by outliers!
Robust selection procedures Stefan Van Aelst 25
![Page 26: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/26.jpg)
Robustness
Robust univariate location estimators
The sample mean Xn satisfies the equation
n∑
i=1
(Xi − Xn) = 0
The ML estimator θ solves the equation
n∑
i=1
∂
∂θlog fθ(Xi)|θ=θ = 0
For a suitable score function ψ(x, θ), the M-estimator Tn
solves the equation
n∑
i=1
ψ(Xi − Tn) = 0
Robust selection procedures Stefan Van Aelst 26
![Page 27: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/27.jpg)
Robustness
Univariate location M-estimators
�
�
�
�
n∑
i=1
ψ(Xi − Tn) = 0
Consistent if∫ψ(y)dF(y) = EF(ψ(y)) = 0
Asymptotic efficiency:(∫ψ′dΦ)2∫ψ2dΦ
Robustness: Maximal breakdown point (50%) if ψ(y) isbounded!
Robust selection procedures Stefan Van Aelst 27
![Page 28: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/28.jpg)
Robustness
Examples of M-estimators
Sample mean: ψ(t) = t: Unbounded! Efficiency: 100%
Median: ψ(t) = sign(t): Bounded, efficiency: 63.7%
Huber estimator: ψb(t) = min{b,max{t,−b}}
=
t if |t| ≤ b
sign(t) b if |t| ≥ bwith b > 0
Robust selection procedures Stefan Van Aelst 28
![Page 29: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/29.jpg)
Robustness
Huber psi function
−4 −2 0 2 4
−4
−2
02
4
x
Hub
er p
si
b−b
Robust selection procedures Stefan Van Aelst 29
![Page 30: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/30.jpg)
Robustness
Tuning the Huber M-estimator
Huber M-estimator has maximal breakdown point for anyb <∞→ b can be chosen for good efficiency at Φ
b = 1.37 yields 95% efficiency→ trade-off between robustness and efficiency!
Robust selection procedures Stefan Van Aelst 30
![Page 31: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/31.jpg)
Robustness
Example: Copper content in flour
Copper content (parts per million) in 24 wholemeal floursamples
510
1520
2530
Robust selection procedures Stefan Van Aelst 31
![Page 32: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/32.jpg)
Robustness
Example: Copper content in flour
Copper content (parts per million) in 24 wholemeal floursamples
Sample mean: 4.28
Sample median: 3.39
Huber M-estimator: 3.21
Robust selection procedures Stefan Van Aelst 32
![Page 33: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/33.jpg)
Robustness
Monotone M-estimates
Huber M-estimator has a monotone psi-function
If the function ψ(t) is monotone, then
Equation∑n
i=1 ψ(Xi − Tn) = 0 has a unique solution
Tn is easy to compute
Tn has maximal breakdown point
Large outliers still affect the estimate (although theeffect remains bounded)
Robust selection procedures Stefan Van Aelst 33
![Page 34: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/34.jpg)
Robustness
Redescending M-estimates
If the function ψ(t) is not monotone, but redescends to zero,then
Equation∑n
i=1 ψ(Xi − Tn) = 0 has multiple solutions
Define ρ(t) such that ρ′(t) = ψ(t), then we need the solution
minTn
n∑
i=1
ρ(Xi − Tn)
Tn can be more difficult to compute
Tn has maximal breakdown point
The effect of large outliers on the estimate reduces to zero!
Increased robustness against large outliers
Robust selection procedures Stefan Van Aelst 34
![Page 35: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/35.jpg)
Robustness
Redescending M-estimates
A popular family of redescending loss functions is the Tukeybiweight (bisquare) family of loss functions:
ρc(t) =
t22 − t4
2c2 +t6
6c4 if |t| ≤ c
c2
6 if |t| ≥ c.
The constant c can be tuned for efficiency
Robust selection procedures Stefan Van Aelst 35
![Page 36: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/36.jpg)
Robustness
Tukey biweight ρ functions
−4 −2 0 2 4
0.0
0.5
1.0
1.5
2.0
t
ρ(t)
c=3
c=2
c=∞
Robust selection procedures Stefan Van Aelst 36
![Page 37: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/37.jpg)
Robustness
Tukey biweight ψ function
−6 −4 −2 0 2 4 6
−4
−2
02
4
x
Psi
func
tion
b−b
c−c
HuberTukey
Robust selection procedures Stefan Van Aelst 37
![Page 38: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/38.jpg)
Robustness
Example: Copper content in flour
Copper content (parts per million) in 24 wholemeal floursamples
Sample mean: 4.28
Sample median: 3.39
Huber M-estimator: 3.21
Tukey biweight M-estimator: 3.16
Robust selection procedures Stefan Van Aelst 38
![Page 39: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/39.jpg)
Robustness
Univariate scale estimators
Example: Copper content (parts per million) in 24 wholemealflour samples
Standard deviation: 5.30
Median absolute deviation (MAD):
Sn = 1.483 med(|Xi − med(Xj)|)
MAD: 0.53
−→ Standard deviation is sensitive to outliers−→ MAD is robust and thus more reliable
Robust selection procedures Stefan Van Aelst 39
![Page 40: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/40.jpg)
Robustness
M-estimators of scale
M-estimator of scale is the solution Sn such thatn∑
i=1
ψ(Xi/Sn) = 0
Symmetric distributions: use symmetric ψ functionsConsistent if
∫ψ(y)dF(y) = EF(ψ(y)) = 0
The Tukey biweight loss functions ρc are symmetricPut b = EΦ(ρc) and define ψc(t) = ρc(t)− b, then the Tukeybiweight M-estimator of scale Sn solves
n∑
i=1
ψc(Xi/Sn) = 0
or equivalently1n
n∑
i=1
ρc(Xi/Sn) = b
Robust selection procedures Stefan Van Aelst 40
![Page 41: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/41.jpg)
Robustness
Example: Copper content in flour
Copper content (parts per million) in 24 wholemeal floursamples
Standard deviation: 5.30
Median absolute deviation: 0.53
Tukey biweight M-estimator: 0.66
Robust selection procedures Stefan Van Aelst 41
![Page 42: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/42.jpg)
Robustness
Robust regression
Denote ri(β) = yi − x′iβ the residuals corresponding to β
βLS solves minβ
n∑
i=1
(yi − x′iβ
)2=
n∑
i=1
(ri(β))2
Denote σ(β) =√∑n
i=1(ri(β))2
n−d the estimate of the residualscale
The LS estimator βLS then equivalently solves minβσ(β)
⇒ Instead minimize a robust estimate of the residual scale
Robust selection procedures Stefan Van Aelst 42
![Page 43: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/43.jpg)
Robustness
Least Median of Squares regression
LS LMS
Minimize1
n − d
n∑
i=1
ri(β)2 −→ Minimize med ri(β)
2
Maximal breakdown point (50%)Small biasSlow rate of convergence (n−1/3)Inefficient
Robust selection procedures Stefan Van Aelst 43
![Page 44: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/44.jpg)
Robustness
Least Trimmed Squares regression
LS LTS
Minimize1
n − d
n∑
i=1
ri(β)2 −→ Minimize
1h
h∑
i=1
(r(β)2)i:n
where (r(β)2)1:n ≤ · · · ≤ (r(β)2)n:n
Breakdown point is min{h, n − h}/n ≤ 50%Asymptotically normalTrade-off robustness-efficiencyLow efficiency (less than 10%)
Robust selection procedures Stefan Van Aelst 44
![Page 45: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/45.jpg)
Robustness
Regression S-estimators
LS S-estimate
Minimize1n
n∑
i=1
ri(β)2 −→ Minimize σ(β)
For each β, σ(β) solves 1n
∑ρc
(ri(β)σ
)= b
c determines both robustness and efficiencyTrade-off robustness-efficiencyBreakdown point can be up to 50%Asymptotically normalEfficiency can still be low (less than 35%)
Robust selection procedures Stefan Van Aelst 45
![Page 46: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/46.jpg)
Robustness
Regression M-estimators
LS M-estimate
Minimizen∑
i=1
ri(β)2 −→ Minimize
n∑
i=1
ρ
(ri(β)
σ
)
or solven∑
i=1
ψ
(ri(β)
σ
)xi = 0
Requires a robust scale estimate σ!
Robust selection procedures Stefan Van Aelst 46
![Page 47: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/47.jpg)
Robustness
MM estimates
LS MM-estimate
Minimizen∑
i=1
ri(β)2 −→ Minimize
n∑
i=1
ρ
(ri(β)
σ
)
σ is S-estimator’s M-scale
M and S-estimator both use Tukey biweight ρc functions
S-estimator is tuned for robustness (breakdown point)
Redescending M-estimator is tuned for efficiency
Robust selection procedures Stefan Van Aelst 47
![Page 48: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/48.jpg)
Robustness
MM: loss functions
Tukey biweight family ρc(t) =
{3 t
c2 − 3 t
c4 + t
c6 if |t| ≤ c
1 if |t| > c,
x
loss
−7 0 c0 c1 7
0.0
0.2
0.4
0.6
0.8
1.0
1.2
ρ0ρ1
ρ0 determines the breakdown point (S-estimator)ρ1 determines the efficiency (MM-estimator)
Robust selection procedures Stefan Van Aelst 48
![Page 49: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/49.jpg)
Robustness
MM estimates
LS MM-estimate
Minimizen∑
i=1
ri(β)2 −→ Minimize
n∑
i=1
ρ
(ri(β)
σ
)
σ is S-estimator’s M-scale
M and S-estimator both use Tukey biweight ρc functions
S-estimator is tuned for robustness (breakdown point)
Redescending M-estimator is tuned for efficiency
Highly robust and efficient!
Robust selection procedures Stefan Van Aelst 49
![Page 50: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/50.jpg)
Robustness
Redescending psi function
⋆ A redescending psi function is needed for robustness, butthis implies
Multiple solutions of score equations
Global solution is needed (high breakdown point)
Difficult (time consuming) to compute
Robust selection procedures Stefan Van Aelst 50
![Page 51: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/51.jpg)
Robust variable selection: sequencing
Robust variable selection
Issues
Robust regression estimators are computationallydemanding
’Outliers’ depend on the model under consideration
High dimensional data: Outlying cases?
Our approach: a two-step procedure
Sequencing: Construct a reduced sequence ofgood predictors in an efficientway.
Segmentation: Build an optimal model fromthe reduced set of predictors.
Robust selection procedures Stefan Van Aelst 51
![Page 52: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/52.jpg)
Robust variable selection: sequencing
Sequencing the variables in order of importance
Automatic variable selection methods such asforward/stepwise selection, LAR and L2 boosting arecomputationally efficient methods to sequence predictors
These methods are based only on the means, variancesand correlations of the data.
⇒ Construct computationally efficient, robust methods to se-quence predictors by using computationally efficient and highlyrobust estimates of center, scale and correlation
Robust selection procedures Stefan Van Aelst 52
![Page 53: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/53.jpg)
Robust variable selection: sequencing
Robust building blocks
Location: Median
Scatter: Median Absolute Deviation
Correlation: Bivariate Winsorization
Correlation: Bivariate M-estimators
Correlation: Gnanadesikan-Kettenring estimators
Robust selection procedures Stefan Van Aelst 53
![Page 54: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/54.jpg)
Robust variable selection: sequencing
Winsorized correlation estimates
1 Robustly standardize the data using median and MAD
2 Transform the data by shifting outliers towards the center
3 Calculate the Pearson correlation of the transformed data
Robust selection procedures Stefan Van Aelst 54
![Page 55: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/55.jpg)
Robust variable selection: sequencing
Univariate Winsorization
Componentwise transformationu = ψc(x) = min(max(−c, x), c)
−4 −2 0 2 4
−4
−2
02
4
x
Huber ψ function with c=2ψ
c(x)
Robust selection procedures Stefan Van Aelst 55
![Page 56: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/56.jpg)
Robust variable selection: sequencing
Univariate Winsorization
Componentwise transformationu = ψc(x) = min(max(−c, x), c)
−4 −2 0 2
−4−3
−2−1
01
23
variable 1
varia
ble 2
Robust selection procedures Stefan Van Aelst 56
![Page 57: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/57.jpg)
Robust variable selection: sequencing
Bivariate Winsorization
Bivariate transformation
u = min(√
c/D(x), 1)x with c = F−1χ2
2(0.95)
D(x) = xtR−10 x with R0 an initial bivariate correlation matrix.
−4 −2 0 2
−4−3
−2−1
01
23
variable 1
varia
ble 2
Robust selection procedures Stefan Van Aelst 57
![Page 58: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/58.jpg)
Robust variable selection: sequencing
Bivariate Winsorization
Bivariate transformation
u = min(√
c/D(x), 1)x with c = F−1χ2
2(0.95)
D(x) = xtR−10 x with R0 an initial bivariate correlation matrix.
−4 −2 0 2
−4−3
−2−1
01
23
variable 1
varia
ble 2
Robust selection procedures Stefan Van Aelst 58
![Page 59: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/59.jpg)
Robust variable selection: sequencing
Bivariate Winsorization
Bivariate transformation
u = min(√
c/D(x), 1)x with c = F−1χ2
2(0.95)
D(x) = xtR−10 x with R0 an initial bivariate correlation matrix.
−4 −2 0 2
−4−3
−2−1
01
23
variable 1
varia
ble 2
Robust selection procedures Stefan Van Aelst 59
![Page 60: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/60.jpg)
Robust variable selection: sequencing
Bivariate Winsorization
Bivariate transformation
u = min(√
c/D(x), 1)x with c = F−1χ2
2(0.95)
D(x) = xtR−10 x with R0 an initial bivariate correlation matrix.
−4 −2 0 2
−4−3
−2−1
01
23
variable 1
varia
ble 2
Robust selection procedures Stefan Van Aelst 60
![Page 61: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/61.jpg)
Robust variable selection: sequencing
Initial correlation estimate
Adjusted Winsorization: Univariate Winsorization with differenttuning constants for different quadrants.
Denote h = ratio of observations in second and fourth quadrantsto the observations in first and third quadrant.
Suppose h ≤ 1, then
Use constant c1 for Winsorizing points in first and third quadrantsUse c2 =
√
hc1 for second and fourth quadrants
R0 is correlation matrix of adjusted Winsorized data
Robust selection procedures Stefan Van Aelst 61
![Page 62: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/62.jpg)
Robust variable selection: sequencing
Initial correlation estimate
Adjusted Winsorization: Univariate Winsorization with differenttuning constants for different quadrants.
−4 −2 0 2
−4−3
−2−1
01
23
variable 1
varia
ble 2
Robust selection procedures Stefan Van Aelst 62
![Page 63: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/63.jpg)
Robust variable selection: sequencing
Initial correlation estimate
Univariate Winsorization
−4 −2 0 2
−4−3
−2−1
01
23
variable 1
varia
ble 2
Robust selection procedures Stefan Van Aelst 63
![Page 64: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/64.jpg)
Robust variable selection: sequencing
Correlation M-estimators
1 First center the two variables using their medians
2 An M-estimate of the covariance matrix is the solution V ofthe equation
1n
∑
i
u2(d2i )xix′i = V,
where d2i = x′iV
−1xi and u2(t) = min(χ22(0.99)/t, 1)
3 Calculate the correlation corresponding to the bivariatecovariance matrix V
Robust selection procedures Stefan Van Aelst 64
![Page 65: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/65.jpg)
Robust variable selection: sequencing
Gnanadesikan-Kettenring correlation estimators
Consider the identity
cov(X, Y) =14
(sd(X + Y)2 − sd(X − Y)2)
Replace the sample standard deviations by robustestimates of scale to obtain robust correlation estimates
Robust selection procedures Stefan Van Aelst 65
![Page 66: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/66.jpg)
Robust variable selection: sequencing
Robust correlations: Computational efficiency
10000 20000 30000 40000 50000
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
sample size
cpu
time
Uni−WinsorAdj−WinsorBi−WinsorMaronna
Robust selection procedures Stefan Van Aelst 66
![Page 67: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/67.jpg)
Robust variable selection: sequencing
Robust LAR: Computational efficiency
Computational efficiency of correlations largely determinescomputing time of Robust LAR
50 100 150 200 250 300
050
100
150
dimension
cpu
time
LARSW−RLARSM−RLARS
Robust selection procedures Stefan Van Aelst 67
![Page 68: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/68.jpg)
Robust variable selection: sequencing
Bootstrapping the sequencing algorithms
Use bootstrap averages to obtain more reliable and stablesequencesProcedure:
1 Generate 50 bootstrap samples2 Sequence predictors in each sample3 Rank predictors according to their average rank over the
bootstrap samples
Not all predictors have to be ranked in each bootstrapsample
Robust selection procedures Stefan Van Aelst 68
![Page 69: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/69.jpg)
Robust variable selection: sequencing
Bootstrap effect on robust LAR
Simulation design
Samples of size 150 in 200 dimensions
10 target predictors
20 noise covariates correlated with target predictors
170 independent noise covariates
10% of symmetric or asymmetric high leverage outliers
We compare with random forests using variableimportance measures to sequence the variables
Robust selection procedures Stefan Van Aelst 69
![Page 70: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/70.jpg)
Robust variable selection: sequencing
Bootstrap RLAR vs RLAR/Random Forests
Symmetric high leverage Asymmetric high leverage
0 10 20 30 40 50
24
68
10
Number of Variables
Num
ber
of T
arge
t Var
iabl
es
B−RLARSRLARSRF−OOBRF−IMP
0 10 20 30 40 50
24
68
Number of Variables
Num
ber
of T
arge
t Var
iabl
es
Robust selection procedures Stefan Van Aelst 70
![Page 71: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/71.jpg)
Robust variable selection: sequencing
Example: Demographic data
n = 50 states of USA, d = 25 covariates.
Response y = murder rate
One outlier
5-fold cross validation selects a model with 7 variables
We sequence the variables using B-RLARSConstruct learning curve
Graphical tool to select the size of reduced sequence inpracticeBased on a robust R2 measure:
e.g. R2 = 1 −Med(residual2)
MAD2(y)
Robust selection procedures Stefan Van Aelst 71
![Page 72: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/72.jpg)
Robust variable selection: sequencing
Demographic data: learning curve
Learning curve
5 10 15 20
0.70
0.75
0.80
0.85
0.90
0.95
1.00
Number of variables in the model
Lear
ning
rat
e
⇒ Reduced set of at most 12 predictors
Robust selection procedures Stefan Van Aelst 72
![Page 73: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/73.jpg)
Robust variable selection: sequencing
Demographic data: models
Full CV model: 7 predictors
B-RLAR+CV: 6 predictors
LAR+CV: 8 predictors
RF-SEL: 5 predictors
RF-SEL+CV: 4 predictors
RF-RED+CV: 5 predictors
MSVM-RFE: 8 predictors
MSVM-RFE+CV: 6 predictors
Robust selection procedures Stefan Van Aelst 73
![Page 74: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/74.jpg)
Robust variable selection: sequencing
Demographic data: model comparison
Density estimates based on 1000 5-fold CV-MSPE estimates.
0 200 400 600 800 1000
0.00
00.
005
0.01
00.
015
5−fold CV−MSPE
dens
ity
Full−CVLARS+CVB−RLARS+CVRF−SEL+CVRF−RED+CVMSVM−RFE
Robust selection procedures Stefan Van Aelst 74
![Page 75: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/75.jpg)
Robust variable selection: sequencing
Example: Protein data
n = 4141 protein sequences, d = 77 covariates.
Training sample of size 2072 and test sample of size 2069.We selected predictors using
B-RLAR: 5 predictorsRF using OOB importance: 22 predictorsMSVM-RFE: 22 predictors
For RF we could determine an optimal submodel in thereduced sequences using robust MM-estimates with robustFPE. ⇒ RF+RFPE: 18 predictors
Robust selection procedures Stefan Van Aelst 75
![Page 76: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/76.jpg)
Robust variable selection: sequencing
Protein data: test sample errors
Trimmed means of squared prediction errors
Trimming fractionModel 1% 5% 10%B-RLAR 116.19 97.73 84.67RF 111.11 93.80 81.30RF-RFPE 111.30 93.92 81.27MSVM-RFE 173.70 150.48 133.17
Robust selection procedures Stefan Van Aelst 76
![Page 77: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/77.jpg)
Robust variable selection: sequencing
Example: Particle data
Quantum physics data with d = 64 predictors.
Training sample of size 5,000, test sample of size 45,000.
FS and SW produced a model with 25 predictors.
Robust FS and SW produced a model with only 1 predictor.
Indeed for more than 80% of the cases X1 = Y = 0.
For the cases with X1 6= 0, FS produced a model with 5predictors.
We fit the final models using MM-estimators.
Robust selection procedures Stefan Van Aelst 77
![Page 78: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/78.jpg)
Robust variable selection: sequencing
Particle data: test sample errors
Trimmed means of squared prediction errors
Trimming fractionModel 1% 5%
FS 0.110 0.012Robust FS 0.032 0.001
Robust selection procedures Stefan Van Aelst 78
![Page 79: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/79.jpg)
Robust variable selection: segmentation
Segmentation: Robust adjusted R-squared
Adjusted R2: A(α) = 1 −RSS(α)/(n − d(α))
RSS(1)/(n − 1)Based on a robust regression estimator we can construct arobust adjusted R2:
RR2a(α) = 1 −
σ2α
(n − d(α))
/σ2
0
(n − 1),
σα is the robust residual scale of the submodel withpredictor indexed by α
σ0 is the robust residual scale of the intercept-onlymodel
Robust selection procedures Stefan Van Aelst 79
![Page 80: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/80.jpg)
Robust variable selection: segmentation
Segmentation: Robust FPE
FPE(α) = RSS(α)σ2 + 2d(α) estimates the final prediction error
FPE(α) =1σ2
n∑
i=1
E[(zi − x′αiβα)
2], assuming that the
model is correct.
Consider now the robust final prediction error:
RFPE(α) =n∑
i=1
E
[ρ
(zi − x′αiβα
σ
)]. Assuming that the
model is correct and using a second order Taylorexpansion, this can be estimated by
RFPE(α) =∑n
i=1 ρ(ri(βα)/σn) + d(α)∑n
i=1 ψ2(ri(βα
)/σn)∑n
i=1 ψ′(ri(βα
)/σn)
σn is the robust scale estimate of a ’full’ model αf . Usually,αf = {1, . . . , d}
Robust selection procedures Stefan Van Aelst 80
![Page 81: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/81.jpg)
Robust variable selection: segmentation
Robust resampling based selection criteria
Robust equivalents of the resampling based selection criteria:
RPE(α) =σ2
n
nE⋆
[n∑
i=1
ρ
(zi − x′αiβα
σn
)∣∣∣∣∣ y,X]
PRPE(α) =σ2
n
n
{n∑
i=1
ρ
(yi − x′αiβα
σn
)+ f (n) d(α)
}+ Mn(α)
ρ is the MM loss function and βα,n is the MM estimate
f (n)d(α) is the penalty term with e.g. f (n) = 2 log n
σn is the robust scale estimate of a ’full’ model αf . Usually,αf = {1, . . . , d}
E⋆ is a robust resampling estimate of the expected value
Robust selection procedures Stefan Van Aelst 81
![Page 82: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/82.jpg)
Robust variable selection: segmentation
Robustness and resampling
Resampling robust estimators causes problems with
robustness
speed
Stratified bootstrap (Müller and Welsh, JASA, 2005) onlysolves the first problem.−→ Limited practical use.
The fast and robust bootstrap solves both problems.
Robust selection procedures Stefan Van Aelst 82
![Page 83: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/83.jpg)
Robust variable selection: segmentation
MM-estimators revisited
For the model comparison we use slightly adjustedMM-estimators:The MM-estimates βα satisfy
1n
n∑
i=1
ψ1
(yi − x′αiβα
σn
)xαi = 0 ,
where σn minimizes the M-scale σn(β), which for any β ∈ Rd isdefined as the solution that satisfies
1n
n∑
i=1
ρ0
(yi − x′iβσn(β)
)= b
ρ0 determines the breakdown point (S-estimator)
ρ1 determines the efficiency (MM-estimator)
Robust selection procedures Stefan Van Aelst 83
![Page 84: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/84.jpg)
Robust variable selection: segmentation
Bootstrapping MM-estimates
Weighted least squares representation of MM-estimator
βα,n =
[n∑
i=1
ωαi xαi x′αi
]−1 n∑
i=1
ωαi xαi yi
with ωαi = ρ′1(rαi/σn)/rαi and rαi = yi − βα,n′xαi
Let (y⋆i , x⋆αi), i = 1, . . . ,m be a bootstrap sample of size
m ≤ n. Then β⋆
α satisfies
β⋆
α,m =
[m∑
i=1
ω⋆αi x⋆αi x⋆′
αi
]−1 m∑
i=1
ω⋆αi x⋆αi y⋆i
with ω⋆αi = ρ′1(r⋆αi/σ
⋆n)/r⋆αi and r⋆αi = y⋆i − β
⋆
α,m′x⋆αi
Robust selection procedures Stefan Van Aelst 84
![Page 85: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/85.jpg)
Robust variable selection: segmentation
Fast and robust bootstrap
Weighted least squares representation of MM-estimator
βα,n =
[n∑
i=1
ωαi xαi x′αi
]−1 n∑
i=1
ωαi xαi yi
with ωαi = ρ′1(rαi/σn)/rαi and rαi = yi − βα,n′xαi
Let (y⋆i , x⋆αi), i = 1, . . . ,m be a bootstrap sample of size
m ≤ n. Define β1,⋆α by
β1,⋆α,m =
[m∑
i=1
ω⋆αi x⋆αi x⋆′
αi
]−1 m∑
i=1
ω⋆αi x⋆αi y⋆i
with ω⋆αi = ρ′1(r⋆αi/σn)/r⋆αi and r⋆αi = y⋆i − βα,n
′x⋆αi
Note that βα,n and σn are not recalculated!
Robust selection procedures Stefan Van Aelst 85
![Page 86: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/86.jpg)
Robust variable selection: segmentation
Fast and robust bootstrap
The estimates β1,⋆α,m will under-estimate the variability of the
completely recalculated estimates β⋆
α,m→ A correction is needed
The fast and robust bootstrap estimates βR∗α,m are given by
βR∗α,m = βα,n + Kα,n
(β
1,⋆α,m − βα,n
)
where
Kα,n = σn
[n∑
i=1
ρ′′1 (rαi/ σn) xαi x′αi
]−1 n∑
i=1
ωαixαi x′αi
Note that Kα,n is computed only once for the originalsample.
Robust selection procedures Stefan Van Aelst 86
![Page 87: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/87.jpg)
Robust variable selection: segmentation
Properties of fast and robust bootstrap
Computationally efficient: weighted least squarescalculations
Robust: No recalculation of observation weights
Robust selection procedures Stefan Van Aelst 87
![Page 88: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/88.jpg)
Robust variable selection: segmentation
Consistent model selection
Suppose a true model α0 ⊂ {1, . . . , d} exists and is included inthe set A of models considered.
If we select the model that minimizes RPE(α) or PRPE(α), thatis
αm n = argminα∈ARPE(α) and αm n = argminα∈APRPE(α),
then, under appropriate regularity conditions, the modelselection criteria are consistent in the sense that
limn→∞
P(αm,n = α0) = 1 and limn→∞
P(αm,n = α0) = 1 .
Two conditions have practical consequences
m = o(n) (m out of n bootstrap)
f (n) = o(n/m)
Robust selection procedures Stefan Van Aelst 88
![Page 89: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/89.jpg)
Robust variable selection: segmentation
Consistent model selection
Suppose a true model α0 ⊂ {1, . . . , d} exists and is included inthe set A of models considered.
If we select the model that minimizes RPE(α) or PRPE(α), thatis
αm n = argminα∈ARPE(α) and αm n = argminα∈APRPE(α),
then, under appropriate regularity conditions, the modelselection criteria are consistent in the sense that
limn→∞
P(αm,n = α0) = 1 and limn→∞
P(αm,n = α0) = 1 .
Two conditions have practical consequences
m = o(n) (m out of n bootstrap)
f (n) = o(n/m)
Robust selection procedures Stefan Van Aelst 89
![Page 90: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/90.jpg)
Robust variable selection: segmentation
Examples
We compare the full model with models selected bybackward elimination based on
RPE(α)PRPE(α) with f (n) = log(n)RFPE
For each of the models we report RR2a(α), the adjusted
robust R2
To compare predictive power we calculated the 5-fold CVtrimmed MSPE
Robust selection procedures Stefan Van Aelst 90
![Page 91: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/91.jpg)
Robust variable selection: segmentation
Example 1: Ozone data
Los Angeles Ozone Pollution Data, 1976366 observations (different days) on 9 variablesResponse: temperature (degrees F) at El Monte, CACovariates: Measurements of temperature, pressure,humidity, ozone, etc at other places in CA.We start from the full quadratic model (d = 45)
model size RR2a 5% Trimmed MSPE
Full 45 0.8660 10.78
RFPE 23 0.8174 10.66
αm,n 10 0.7583 11.67
αm,n 7 0.7643 10.45
Robust selection procedures Stefan Van Aelst 91
![Page 92: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/92.jpg)
Robust variable selection: segmentation
Example 2: Diabetes data
442 observations on 16 variables.Response: Measure of disease progression one year afterbaselineCovariates: 10 baseline variables (age, sex, BMI, bloodpressure, ...)We start from a quadratic model with some interactions(d = 65)
model size RR2a 5% Trimmed MSE
Full 65 0.7731 4988.1
RFPE 16 0.6045 2231.2
αm,n 11 0.5127 2657.2
αm,n 7 0.5302 2497.0
Robust selection procedures Stefan Van Aelst 92
![Page 93: Robust strategies and model selection · 2013. 1. 12. · 1 Regression model 2 Least squares 3 Manual variable selection approach 4 Automatic variable selection approach 5 Robustness](https://reader036.vdocuments.us/reader036/viewer/2022081615/5fe061f3916ef964b32923ed/html5/thumbnails/93.jpg)
References
◮ Khan, J.A., Van Aelst, S., and Zamar, R.H. (2007).Building a Robust Linear Model with Forward Selection andStepwise Procedures.Computational Statistics and Data Analysis, 52, 239-248.
◮ Khan, J.A., Van Aelst, S., and Zamar, R.H. (2007).Robust Linear Model Selection Based on Least AngleRegression.Journal of the American Statistical Association, 102,1289-1299.
◮ Lutz, R.W., Kalisch, M., and Bühlmann, P. (2008).Robustified L2 boosting.Computational Statistics and Data Analysis, 52, 3331-3341.
◮ Maronna, R. A., Martin, D. R. and Yohai, V. J. (2006).Robust Statistics: Theory and Methods,Wiley: New York.
◮ Salibian-Barrera, M. and Van Aelst S. (2007).Robust Model Selection Using Fast and Robust Bootstrap.Computational Statistics and Data Analysis, 52, 5121-5135
Robust selection procedures Stefan Van Aelst 93