robustness of kernel based regression: a comparison of

19
Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References Robustness of Kernel Based Regression: a Comparison of Iterative Weighting Schemes K. De Brabanter 1 , K. Pelckmans 2 , J. De Brabanter 1,3 , M. Debruyne 4 , J.A.K. Suykens 1 , M. Hubert 5 , B. De Moor 1 1 Department of Electrical Engineering (ESAT)-SCD, K.U. Leuven, Belgium 2 Department of Information Technology, Uppsala University, Sweden 3 Department Industrieel Ingenieur (Associatie K.U. Leuven), Gent, Belgium 4 Department of Mathematics & Computer Science, Univ. Antwerpen, Belgium 5 Department of Statistics, K.U. Leuven, Belgium ICANN 2009, Limassol (Cyprus) September 16, 2009 1 / 19

Upload: others

Post on 17-Jul-2022

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Robustness of Kernel Based Regression: a

Comparison of Iterative Weighting Schemes

K. De Brabanter1, K. Pelckmans2, J. De Brabanter1,3,M. Debruyne4, J.A.K. Suykens1, M. Hubert5, B. De Moor1

1Department of Electrical Engineering (ESAT)-SCD, K.U. Leuven, Belgium

2Department of Information Technology, Uppsala University, Sweden

3Department Industrieel Ingenieur (Associatie K.U. Leuven), Gent, Belgium

4Department of Mathematics & Computer Science, Univ. Antwerpen, Belgium

5Department of Statistics, K.U. Leuven, Belgium

ICANN 2009, Limassol (Cyprus)September 16, 2009

1 / 19

Page 2: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Outline

1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM

2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example

3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off

4 SimulationsToy ExampleReal Data Sets

5 Conclusion

2 / 19

Page 3: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Outline

1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM

2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example

3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off

4 SimulationsToy ExampleReal Data Sets

5 Conclusion

3 / 19

Page 4: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

LS-SVM for Regression: Problem Formulation

Dn = (Xk ,Yk) : Xk ∈ Rd ,Yk ∈ R; k = 1, . . . , n

Model: Yk = g(Xk) + ek , k = 1, . . . , n and g ∈ C r (R) withr ≥ 2

Goal: estimate function g

Optimization problem (P) (Suykens et al., 1999)

minw ,b,e

JP(w , e) = 12wTw + γ

2

∑nk=1 e2

k

s.t. Yk = wTϕ(Xk) + b + ek , k = 1, . . . , n.

4 / 19

Page 5: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

LS-SVM for Regression: Dual Formulation

Lagrange multipliers ⇒ no explicit expression of ϕ

Solution given by solving linear system

(

0 1Tn

1n Ω + Dγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K (Xk ,Xl)

Dγ = 1γ In

Model in dual space g(x) =∑n

k=1 αkK (x ,Xk) + b

K (·, ·) has to be positive definite

5 / 19

Page 6: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Weighted LS-SVM

Replace L2 loss with L1, Huber loss function, . . . ⇒ QP

Higher computational cost

Other way: weighting of residuals from classical LS-SVM

Results in solving (multiple) linear systems

Optimization problem (P) (Suykens et al., 2002)

minw ,b,e

JP(w , e) = 12wTw + γ

2

∑nk=1 vke2

k

s.t. Yk = wTϕ(Xk) + b + ek , k = 1, . . . , n.

vk =

1, |ek/s | ≤ c1;c2−|ek/s |

c2−c1, c1 ≤ |ek/s | ≤ c2;

10−8, otherwise.

6 / 19

Page 7: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Outline

1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM

2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example

3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off

4 SimulationsToy ExampleReal Data Sets

5 Conclusion

7 / 19

Page 8: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Problems With Outliers in Nonparametric Regression

NW, local polynomial regression, splines, . . .⇒ L2 risk

Simple mathematics

fast computation

Sensitive to outliers ⇒ unbounded IF

Different type of kernels ⇒ leverage points

Bounded IF when using decreasing kernels e.g. RBF kernel

Robust CV absolutely necessary!!

8 / 19

Page 9: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Toy Example

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

f(X

)e

k ∼ N(0,0.05)

X

LS−SVMSVMReal functiondata

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ N(0,0.05)+3 outliers

LS−SVMSVMReal functiondata

outliers

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ N(0,0.05)+3 outliers

weighted LS−SVMSVMReal functiondata

outliers

0 0.2 0.4 0.6 0.8 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ (1−ε) N(0,0.1) + ε C3

weighted LS−SVMSVMReal functiondata

9 / 19

Page 10: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Outline

1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM

2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example

3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off

4 SimulationsToy ExampleReal Data Sets

5 Conclusion

10 / 19

Page 11: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Weight Functions

Huber Hampel Logistic Myriad

V (r)

1, if |r| < β;β

|r|, if |r| ≥ β.

1, if |r| < b1 ;b2−|r|b2−b1

, if b1 ≤ |r| ≤ b2 ;

0, if |r| > b2 .

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2, if |r| < β;

β|r| − 12β2, if |r| ≥ β.

r2, if |r| < b1 ;b2r2−|r3|

b2−b1, if b1 ≤ |r| ≤ b2 ;

0, if |r| > b2 .

r tanh(r) log(δ2 + r2)

11 / 19

Page 12: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Algorithm

Iteratively Reweighted LS-SVM

1 Given tuning parameters and residuals ek = α/γ fromunweighted LS-SVM

2 repeat

3 Compute robust estimate of standard deviation s from e(i)k

distribution

4 Choose weight function and compute weights v(i)k based on

r(i)k = e

(i)k /s

5 Set Dγ = diag

1

γv(i)1

, . . . , 1

γv(i)n

6 Solve linear system ⇒ α(i), b(i)

7 Set i := i + 1

8 until maxk(|α(i−1)k − α

(i)k |) ≤ 10−4)

12 / 19

Page 13: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Speed of Convergence - Robustness Trade-Off

Influence Function (Hampel, 1968)

The IF describes the effect of an additional observation in any

point x on a statistic T , given a sample with distribution F .

Main Result of Debruyne et al., 2008

The Influence Function (IF) of reweighted Least Squares KernelBased Regression (LS-KBR) with a bounded kernel converges tobounded Influence Function, even when the initial LS-KBR is notrobust, if

(c1) ψ : R → R is a measurable, real, odd function

(c2) ψ is continuous and differentiable

(c3) ψ is bounded

(c4) EPeψ′(e) > 0

13 / 19

Page 14: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Speed of Convergence - Robustness Trade-Off (cont’d)

Define d = EPe

ψ(e)e

and c = d − EPeψ′(e)

The higher the ratio c/d the higher the degree of robustness, butthe slower the reduction of the IF at each iteration and vice versa

Weight Parameter N(0, 1) C(0, 1)

function settings c d c/d c d c/d

β = 0.5 0.32 0.71 0.46 0.26 0.55 0.47

Huber β = 1 0.22 0.91 0.25 0.22 0.72 0.31

β = 2 0.04 0.99 0.04 0.14 0.85 0.17

Logistic 0.22 0.82 0.26 0.21 0.66 0.32

Hampelb1 = 2.5

0.006 0.99 0.006 0.02 0.78 0.025b2 = 3

δ = 0.1 0.11 0.12 0.92 0.083 0.091 0.91

δ = 0.6475 0.31 0.53 0.60 0.24 0.40 0.60Myriad

δ = 1 0.31 0.66 0.47 0.25 0.50 0.5014 / 19

Page 15: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Outline

1 IntroductionLeast Squares Support Vector Machines for RegressionRobustifying LS-SVM

2 Problems With Outliers in Nonparametric RegressionProblems With OutliersToy Example

3 Iteratively Reweighted Kernel Based RegressionWeight FunctionsIteratively Reweighted LS-SVMSpeed of Convergence - Robustness Trade-Off

4 SimulationsToy ExampleReal Data Sets

5 Conclusion

15 / 19

Page 16: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Toy Example (cont’d)

0 0.2 0.4 0.6 0.8 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ (1−ε) N(0,0.1) + ε C3

Real functiondataIRLS−SVM (Huber)IRLS−SVM (Hampel)

0 0.2 0.4 0.6 0.8 1−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

X

f(X

)

ek ∼ (1−ε) N(0,0.1) + ε C3

Real functiondataIRLS−SVM (Logistic)IRLS−SVM (Myriad)

ε = 0.3 L1 L2 L∞ imax

Huber 0.06 0.005 0.12 7

Hampel 0.06 0.005 0.13 4

Logistic 0.06 0.005 0.11 11

Myriad 0.03 0.002 0.06 17

16 / 19

Page 17: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Octane and Demographic Data Set

Octane: 39 production samples from NIR absorbance spectraover 226 wavelengthsDemographic: info about 50 states of USA on 25 variablesPerformances are given on test data resp. 10 and 15 (200Monte Carlo simulations)Robust CV

Octane Demographic

weights L1 L2 L∞ imax L1 L2 L∞ imax

Huber 0.19(0.03) 0.07(0.02) 0.51(0.10) 15 0.31(0.01) 0.14(0.02) 0.83(0.06) 8

IRLS Hampel 0.22(0.03) 0.07(0.03) 0.55(0.14) 2 0.33(0.01) 0.18(0.04) 0.97(0.02) 3

SVM Logistic 0.20(0.03) 0.06(0.02) 0.51(0.10) 18 0.30(0.02) 0.13(0.01) 0.80(0.07) 10

Myriad 0.20(0.03) 0.06(0.02) 0.50(0.09) 22 0.30(0.01) 0.13(0.01) 0.79(0.06) 12

WLS0.22(0.03) 0.08(0.02) 0.60(0.15) 1 0.33(0.02) 0.15(0.01) 0.80(0.02) 1

SVM

SVM 0.28(0.03) 0.12(0.02) 0.56(0.13) - 0.37(0.02) 0.21(0.02) 0.90(0.06) -

17 / 19

Page 18: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

Conclusion

Comparison of 4 weight functions to robustify LS-SVM

Reweighting is useful alternative to QP’s

Robustness is obtained by solving (multiple) linear systems

Trade-off between speed of convergence and degree ofrobustness

Myriad weight function is highly robust, but has slower speedof convergence

18 / 19

Page 19: Robustness of Kernel Based Regression: a Comparison of

Introduction Problems With Outliers Iterative Reweighting Simulations Conclusion References

References

Debruyne, M., Christmann, A., Hubert, M., Suykens, J.A.K. (2008), Robustnessand Stability of Reweighted Kernel Based Regression. Technical Report 06-09,Department of Mathematics, K.U.Leuven (Leuven, Belgium).

Hampel, F. R. (1968), Contributions to the Theory of Robust Estimation, Ph.D.Thesis, University of California, Berkeley.

Suykens, J. A. K. and Vandewalle, J. (1999), Least squares support vectormachine classifiers, Neural Processing Letters, 9(3): 293–300.

Suykens, J. A. K., Van Gestel, T., De Brabanter, J., De Moor, B. andVandewalle, J. (2002), Least Squares Support Vector Machines. World Scientific,Singapore.

Suykens J. A. K., De Brabanter J., Lukas L., Vandewalle J. (2002), WeightedLeast Squares Support Vector Machines : Robustness and Sparse Approximation,Neurocomputing, vol. 48, no. 1-4, pp. 85-105

19 / 19