robustness of kernel based regression: a comparison of
TRANSCRIPT
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Robustness of Kernel Based Regression: a
Comparison of Iterative Weighting Schemes
K. De Brabanter1, K. Pelckmans2, J. De Brabanter1,3,M. Debruyne4, J.A.K. Suykens1, M. Hubert5, B. De Moor1
1Department of Electrical Engineering (ESAT)-SCD, K.U. Leuven, Belgium
2Department of Information Technology, Uppsala University, Sweden
3Department Industrieel Ingenieur (Associatie K.U. Leuven), Gent, Belgium
4Department of Mathematics & Computer Science, Univ. Antwerpen, Belgium
5Department of Statistics, K.U. Leuven, Belgium
ICANN 2009, Limassol (Cyprus)September 16, 2009
1 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Outline
1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM
2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example
3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off
4 SimulationsToy ExampleReal Data Sets
5 Conclusion
2 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Outline
1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM
2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example
3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off
4 SimulationsToy ExampleReal Data Sets
5 Conclusion
3 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
LS-SVM for Regression: Problem Formulation
Dn = (Xk ,Yk) : Xk ∈ Rd ,Yk ∈ R; k = 1, . . . , n
Model: Yk = g(Xk) + ek , k = 1, . . . , n and g ∈ C r (R) withr ≥ 2
Goal: estimate function g
Optimization problem (P) (Suykens et al., 1999)
minw ,b,e
JP(w , e) = 12wTw + γ
2
∑nk=1 e2
k
s.t. Yk = wTϕ(Xk) + b + ek , k = 1, . . . , n.
4 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
LS-SVM for Regression: Dual Formulation
Lagrange multipliers ⇒ no explicit expression of ϕ
Solution given by solving linear system
(
0 1Tn
1n Ω + Dγ
)
(
b
α
)
=
(
0
Y
)
Ωkl = ϕ(Xk)Tϕ(Xl) = K (Xk ,Xl)
Dγ = 1γ In
Model in dual space g(x) =∑n
k=1 αkK (x ,Xk) + b
K (·, ·) has to be positive definite
5 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Weighted LS-SVM
Replace L2 loss with L1, Huber loss function, . . . ⇒ QP
Higher computational cost
Other way: weighting of residuals from classical LS-SVM
Results in solving (multiple) linear systems
Optimization problem (P) (Suykens et al., 2002)
minw ,b,e
JP(w , e) = 12wTw + γ
2
∑nk=1 vke2
k
s.t. Yk = wTϕ(Xk) + b + ek , k = 1, . . . , n.
vk =
1, |ek/s | ≤ c1;c2−|ek/s |
c2−c1, c1 ≤ |ek/s | ≤ c2;
10−8, otherwise.
6 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Outline
1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM
2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example
3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off
4 SimulationsToy ExampleReal Data Sets
5 Conclusion
7 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Problems With Outliers in Nonparametric Regression
NW, local polynomial regression, splines, . . .⇒ L2 risk
Simple mathematics
fast computation
Sensitive to outliers ⇒ unbounded IF
Different type of kernels ⇒ leverage points
Bounded IF when using decreasing kernels e.g. RBF kernel
Robust CV absolutely necessary!!
8 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Toy Example
0 0.2 0.4 0.6 0.8 1−1.5
−1
−0.5
0
0.5
1
1.5
2
f(X
)e
k ∼ N(0,0.05)
X
LS−SVMSVMReal functiondata
0 0.2 0.4 0.6 0.8 1−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
X
f(X
)
ek ∼ N(0,0.05)+3 outliers
LS−SVMSVMReal functiondata
outliers
0 0.2 0.4 0.6 0.8 1−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
X
f(X
)
ek ∼ N(0,0.05)+3 outliers
weighted LS−SVMSVMReal functiondata
outliers
0 0.2 0.4 0.6 0.8 1−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
X
f(X
)
ek ∼ (1−ε) N(0,0.1) + ε C3
weighted LS−SVMSVMReal functiondata
9 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Outline
1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM
2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example
3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off
4 SimulationsToy ExampleReal Data Sets
5 Conclusion
10 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Weight Functions
Huber Hampel Logistic Myriad
V (r)
1, if |r| < β;β
|r|, if |r| ≥ β.
1, if |r| < b1 ;b2−|r|b2−b1
, if b1 ≤ |r| ≤ b2 ;
0, if |r| > b2 .
tanh(r)r
δ2
δ2+r2
ψ(r)
L(r)
r2, if |r| < β;
β|r| − 12β2, if |r| ≥ β.
r2, if |r| < b1 ;b2r2−|r3|
b2−b1, if b1 ≤ |r| ≤ b2 ;
0, if |r| > b2 .
r tanh(r) log(δ2 + r2)
11 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Algorithm
Iteratively Reweighted LS-SVM
1 Given tuning parameters and residuals ek = α/γ fromunweighted LS-SVM
2 repeat
3 Compute robust estimate of standard deviation s from e(i)k
distribution
4 Choose weight function and compute weights v(i)k based on
r(i)k = e
(i)k /s
5 Set Dγ = diag
1
γv(i)1
, . . . , 1
γv(i)n
6 Solve linear system ⇒ α(i), b(i)
7 Set i := i + 1
8 until maxk(|α(i−1)k − α
(i)k |) ≤ 10−4)
12 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Speed of Convergence - Robustness Trade-Off
Influence Function (Hampel, 1968)
The IF describes the effect of an additional observation in any
point x on a statistic T , given a sample with distribution F .
Main Result of Debruyne et al., 2008
The Influence Function (IF) of reweighted Least Squares KernelBased Regression (LS-KBR) with a bounded kernel converges tobounded Influence Function, even when the initial LS-KBR is notrobust, if
(c1) ψ : R → R is a measurable, real, odd function
(c2) ψ is continuous and differentiable
(c3) ψ is bounded
(c4) EPeψ′(e) > 0
13 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Speed of Convergence - Robustness Trade-Off (cont’d)
Define d = EPe
ψ(e)e
and c = d − EPeψ′(e)
The higher the ratio c/d the higher the degree of robustness, butthe slower the reduction of the IF at each iteration and vice versa
Weight Parameter N(0, 1) C(0, 1)
function settings c d c/d c d c/d
β = 0.5 0.32 0.71 0.46 0.26 0.55 0.47
Huber β = 1 0.22 0.91 0.25 0.22 0.72 0.31
β = 2 0.04 0.99 0.04 0.14 0.85 0.17
Logistic 0.22 0.82 0.26 0.21 0.66 0.32
Hampelb1 = 2.5
0.006 0.99 0.006 0.02 0.78 0.025b2 = 3
δ = 0.1 0.11 0.12 0.92 0.083 0.091 0.91
δ = 0.6475 0.31 0.53 0.60 0.24 0.40 0.60Myriad
δ = 1 0.31 0.66 0.47 0.25 0.50 0.5014 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Outline
1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM
2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example
3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off
4 SimulationsToy ExampleReal Data Sets
5 Conclusion
15 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Toy Example (cont’d)
0 0.2 0.4 0.6 0.8 1−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
X
f(X
)
ek ∼ (1−ε) N(0,0.1) + ε C3
Real functiondataIRLS−SVM (Huber)IRLS−SVM (Hampel)
0 0.2 0.4 0.6 0.8 1−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
X
f(X
)
ek ∼ (1−ε) N(0,0.1) + ε C3
Real functiondataIRLS−SVM (Logistic)IRLS−SVM (Myriad)
ε = 0.3 L1 L2 L∞ imax
Huber 0.06 0.005 0.12 7
Hampel 0.06 0.005 0.13 4
Logistic 0.06 0.005 0.11 11
Myriad 0.03 0.002 0.06 17
16 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Octane and Demographic Data Set
Octane: 39 production samples from NIR absorbance spectraover 226 wavelengthsDemographic: info about 50 states of USA on 25 variablesPerformances are given on test data resp. 10 and 15 (200Monte Carlo simulations)Robust CV
Octane Demographic
weights L1 L2 L∞ imax L1 L2 L∞ imax
Huber 0.19(0.03) 0.07(0.02) 0.51(0.10) 15 0.31(0.01) 0.14(0.02) 0.83(0.06) 8
IRLS Hampel 0.22(0.03) 0.07(0.03) 0.55(0.14) 2 0.33(0.01) 0.18(0.04) 0.97(0.02) 3
SVM Logistic 0.20(0.03) 0.06(0.02) 0.51(0.10) 18 0.30(0.02) 0.13(0.01) 0.80(0.07) 10
Myriad 0.20(0.03) 0.06(0.02) 0.50(0.09) 22 0.30(0.01) 0.13(0.01) 0.79(0.06) 12
WLS0.22(0.03) 0.08(0.02) 0.60(0.15) 1 0.33(0.02) 0.15(0.01) 0.80(0.02) 1
SVM
SVM 0.28(0.03) 0.12(0.02) 0.56(0.13) - 0.37(0.02) 0.21(0.02) 0.90(0.06) -
17 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
Conclusion
Comparison of 4 weight functions to robustify LS-SVM
Reweighting is useful alternative to QP’s
Robustness is obtained by solving (multiple) linear systems
Trade-off between speed of convergence and degree ofrobustness
Myriad weight function is highly robust, but has slower speedof convergence
18 / 19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References
References
Debruyne, M., Christmann, A., Hubert, M., Suykens, J.A.K. (2008), Robustnessand Stability of Reweighted Kernel Based Regression. Technical Report 06-09,Department of Mathematics, K.U.Leuven (Leuven, Belgium).
Hampel, F. R. (1968), Contributions to the Theory of Robust Estimation, Ph.D.Thesis, University of California, Berkeley.
Suykens, J. A. K. and Vandewalle, J. (1999), Least squares support vectormachine classifiers, Neural Processing Letters, 9(3): 293–300.
Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B. andVandewalle, J. (2002), Least Squares Support Vector Machines. World Scientific,Singapore.
Suykens J. A. K., De Brabanter J., Lukas L., Vandewalle J. (2002), WeightedLeast Squares Support Vector Machines : Robustness and Sparse Approximation,Neurocomputing, vol. 48, no. 1-4, pp. 85-105
19 / 19