robustness of kernel based regression: a comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Robustness of Kernel Based Regression: a

Comparison of Iterative Weighting Schemes

K. De Brabanter1, K. Pelckmans2, J. De Brabanter1,3,M. Debruyne4, J.A.K. Suykens1, M. Hubert5, B. De Moor1

1Department of Electrical Engineering (ESAT)-SCD, K.U. Leuven, Belgium

2Department of Information Technology, Uppsala University, Sweden

3Department Industrieel Ingenieur (Associatie K.U. Leuven), Gent, Belgium

4Department of Mathematics & Computer Science, Univ. Antwerpen, Belgium

5Department of Statistics, K.U. Leuven, Belgium

ICANN 2009, Limassol (Cyprus)September 16, 2009

1 / 19


Outline

1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM

2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example

3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off

4 SimulationsToy ExampleReal Data Sets

5 Conclusion

2 / 19


Outline





5 Conclusion

3 / 19


LS-SVM for Regression: Problem Formulation

Dn = (Xk ,Yk) : Xk ∈ Rd ,Yk ∈ R; k = 1, . . . , n

Model: Yk = g(Xk) + ek , k = 1, . . . , n and g ∈ C r (R) withr ≥ 2

Goal: estimate function g

Optimization problem (P) (Suykens et al., 1999)

minw ,b,e

JP(w , e) = 12wTw + γ

2

∑nk=1 e2

k

s.t. Yk = wTϕ(Xk) + b + ek , k = 1, . . . , n.

4 / 19


LS-SVM for Regression: Dual Formulation

Lagrange multipliers ⇒ no explicit expression of ϕ

Solution given by solving linear system

(

0 1Tn

1n Ω + Dγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K (Xk ,Xl)

Dγ = 1γ In

Model in dual space g(x) =∑n

k=1 αkK (x ,Xk) + b

K (·, ·) has to be positive definite

5 / 19


Weighted LS-SVM

Replace L2 loss with L1, Huber loss function, . . . ⇒ QP

Higher computational cost

Other way: weighting of residuals from classical LS-SVM

Results in solving (multiple) linear systems

Optimization problem (P) (Suykens et al., 2002)

minw ,b,e

JP(w , e) = 12wTw + γ

2

∑nk=1 vke2

k

s.t. Yk = wTϕ(Xk) + b + ek , k = 1, . . . , n.

vk =

1, |ek/s | ≤ c1;c2−|ek/s |

c2−c1, c1 ≤ |ek/s | ≤ c2;

10−8, otherwise.

6 / 19


Outline





5 Conclusion

7 / 19


Problems With Outliers in Nonparametric Regression

NW, local polynomial regression, splines, . . .⇒ L2 risk

Simple mathematics

fast computation

Sensitive to outliers ⇒ unbounded IF

Different type of kernels ⇒ leverage points

Bounded IF when using decreasing kernels e.g. RBF kernel

Robust CV absolutely necessary!!

8 / 19


Toy Example

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

f(X

)e

k ∼ N(0,0.05)

X

LS−SVMSVMReal functiondata

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ N(0,0.05)+3 outliers

LS−SVMSVMReal functiondata

outliers

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ N(0,0.05)+3 outliers

weighted LS−SVMSVMReal functiondata

outliers

0 0.2 0.4 0.6 0.8 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ (1−ε) N(0,0.1) + ε C3

weighted LS−SVMSVMReal functiondata

9 / 19


Outline





5 Conclusion

10 / 19


Weight Functions

Huber Hampel Logistic Myriad

V (r)

1, if |r| < β;β

|r|, if |r| ≥ β.

1, if |r| < b1 ;b2−|r|b2−b1

, if b1 ≤ |r| ≤ b2 ;

0, if |r| > b2 .

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2, if |r| < β;

β|r| − 12β2, if |r| ≥ β.

r2, if |r| < b1 ;b2r2−|r3|

b2−b1, if b1 ≤ |r| ≤ b2 ;

0, if |r| > b2 .

r tanh(r) log(δ2 + r2)

11 / 19


Algorithm

Iteratively Reweighted LS-SVM

1 Given tuning parameters and residuals ek = α/γ fromunweighted LS-SVM

2 repeat

3 Compute robust estimate of standard deviation s from e(i)k

distribution

4 Choose weight function and compute weights v(i)k based on

r(i)k = e

(i)k /s

5 Set Dγ = diag

1

γv(i)1

, . . . , 1

γv(i)n

6 Solve linear system ⇒ α(i), b(i)

7 Set i := i + 1

8 until maxk(|α(i−1)k − α

(i)k |) ≤ 10−4)

12 / 19


Speed of Convergence - Robustness Trade-Off

Influence Function (Hampel, 1968)

The IF describes the effect of an additional observation in any

point x on a statistic T , given a sample with distribution F .

Main Result of Debruyne et al., 2008

The Influence Function (IF) of reweighted Least Squares KernelBased Regression (LS-KBR) with a bounded kernel converges tobounded Influence Function, even when the initial LS-KBR is notrobust, if

(c1) ψ : R → R is a measurable, real, odd function

(c2) ψ is continuous and differentiable

(c3) ψ is bounded

(c4) EPeψ′(e) > 0

13 / 19


Speed of Convergence - Robustness Trade-Off (cont’d)

Define d = EPe

ψ(e)e

and c = d − EPeψ′(e)

The higher the ratio c/d the higher the degree of robustness, butthe slower the reduction of the IF at each iteration and vice versa

Weight Parameter N(0, 1) C(0, 1)

function settings c d c/d c d c/d

β = 0.5 0.32 0.71 0.46 0.26 0.55 0.47

Huber β = 1 0.22 0.91 0.25 0.22 0.72 0.31

β = 2 0.04 0.99 0.04 0.14 0.85 0.17

Logistic 0.22 0.82 0.26 0.21 0.66 0.32

Hampelb1 = 2.5

0.006 0.99 0.006 0.02 0.78 0.025b2 = 3

δ = 0.1 0.11 0.12 0.92 0.083 0.091 0.91

δ = 0.6475 0.31 0.53 0.60 0.24 0.40 0.60Myriad

δ = 1 0.31 0.66 0.47 0.25 0.50 0.5014 / 19


Outline





5 Conclusion

15 / 19


Toy Example (cont’d)

0 0.2 0.4 0.6 0.8 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ (1−ε) N(0,0.1) + ε C3

Real functiondataIRLS−SVM (Huber)IRLS−SVM (Hampel)

0 0.2 0.4 0.6 0.8 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ (1−ε) N(0,0.1) + ε C3

Real functiondataIRLS−SVM (Logistic)IRLS−SVM (Myriad)

ε = 0.3 L1 L2 L∞ imax

Huber 0.06 0.005 0.12 7

Hampel 0.06 0.005 0.13 4

Logistic 0.06 0.005 0.11 11

Myriad 0.03 0.002 0.06 17

16 / 19


Octane and Demographic Data Set

Octane: 39 production samples from NIR absorbance spectraover 226 wavelengthsDemographic: info about 50 states of USA on 25 variablesPerformances are given on test data resp. 10 and 15 (200Monte Carlo simulations)Robust CV

Octane Demographic

weights L1 L2 L∞ imax L1 L2 L∞ imax

Huber 0.19(0.03) 0.07(0.02) 0.51(0.10) 15 0.31(0.01) 0.14(0.02) 0.83(0.06) 8

IRLS Hampel 0.22(0.03) 0.07(0.03) 0.55(0.14) 2 0.33(0.01) 0.18(0.04) 0.97(0.02) 3

SVM Logistic 0.20(0.03) 0.06(0.02) 0.51(0.10) 18 0.30(0.02) 0.13(0.01) 0.80(0.07) 10

Myriad 0.20(0.03) 0.06(0.02) 0.50(0.09) 22 0.30(0.01) 0.13(0.01) 0.79(0.06) 12

WLS0.22(0.03) 0.08(0.02) 0.60(0.15) 1 0.33(0.02) 0.15(0.01) 0.80(0.02) 1

SVM

SVM 0.28(0.03) 0.12(0.02) 0.56(0.13) - 0.37(0.02) 0.21(0.02) 0.90(0.06) -

17 / 19


Conclusion

Comparison of 4 weight functions to robustify LS-SVM

Reweighting is useful alternative to QP’s

Robustness is obtained by solving (multiple) linear systems

Trade-off between speed of convergence and degree ofrobustness

Myriad weight function is highly robust, but has slower speedof convergence

18 / 19


References

Debruyne, M., Christmann, A., Hubert, M., Suykens, J.A.K. (2008), Robustnessand Stability of Reweighted Kernel Based Regression. Technical Report 06-09,Department of Mathematics, K.U.Leuven (Leuven, Belgium).

Hampel, F. R. (1968), Contributions to the Theory of Robust Estimation, Ph.D.Thesis, University of California, Berkeley.

Suykens, J. A. K. and Vandewalle, J. (1999), Least squares support vectormachine classifiers, Neural Processing Letters, 9(3): 293–300.

Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B. andVandewalle, J. (2002), Least Squares Support Vector Machines. World Scientific,Singapore.

Suykens J. A. K., De Brabanter J., Lukas L., Vandewalle J. (2002), WeightedLeast Squares Support Vector Machines : Robustness and Sparse Approximation,Neurocomputing, vol. 48, no. 1-4, pp. 85-105

19 / 19

robustness of kernel based regression: a comparison of

Documents