least squares support vector regression with...

94
Goal & Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions Least Squares Support Vector Regression with Applications to Large-Scale Data: a Statistical Approach Kris De Brabanter Public Defense April, 27 2011 Promotor: Prof. dr. ir. B. De Moor Co-Promotor: Prof. dr. ir. J. Suykens 1/39

Upload: lecong

Post on 13-Jul-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least Squares Support Vector Regression with

Applications to Large-Scale Data a Statistical

Approach

Kris De Brabanter

Public Defense

April 27 2011

Promotor Prof dr ir B De MoorCo-Promotor Prof dr ir J Suykens

139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Goal of the Thesis

Goal of the Thesis

Study the properties of Least Squares Support Vector Machines forregression with an emphasis on statistical aspects and develop aframework for large scale data

439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Overview

Least Squares

Support Vector

Machines

Introduction

(Chapter 1)

Model

Building

(Chapter 2)

Model

Selection

(Chapter 3)Large Scale

Data Sets

(Chapter 4)

Robustness

(Chapter 5)

Correlated

Errors

(Chapter 6)

Confidence

Intervals

(Chapter 7)

Applications

amp

Case Studies

(Chapter 8)

Conclusions

amp

Further

Research

(Chapter 9)

539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Goal of the Thesis

Goal of the Thesis

Study the properties of Least Squares Support Vector Machines forregression with an emphasis on statistical aspects and develop aframework for large scale data

439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Overview

Least Squares

Support Vector

Machines

Introduction

(Chapter 1)

Model

Building

(Chapter 2)

Model

Selection

(Chapter 3)Large Scale

Data Sets

(Chapter 4)

Robustness

(Chapter 5)

Correlated

Errors

(Chapter 6)

Confidence

Intervals

(Chapter 7)

Applications

amp

Case Studies

(Chapter 8)

Conclusions

amp

Further

Research

(Chapter 9)

539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Goal of the Thesis

Goal of the Thesis

Study the properties of Least Squares Support Vector Machines forregression with an emphasis on statistical aspects and develop aframework for large scale data

439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Overview

Least Squares

Support Vector

Machines

Introduction

(Chapter 1)

Model

Building

(Chapter 2)

Model

Selection

(Chapter 3)Large Scale

Data Sets

(Chapter 4)

Robustness

(Chapter 5)

Correlated

Errors

(Chapter 6)

Confidence

Intervals

(Chapter 7)

Applications

amp

Case Studies

(Chapter 8)

Conclusions

amp

Further

Research

(Chapter 9)

539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Goal of the Thesis

Goal of the Thesis

Study the properties of Least Squares Support Vector Machines forregression with an emphasis on statistical aspects and develop aframework for large scale data

439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Overview

Least Squares

Support Vector

Machines

Introduction

(Chapter 1)

Model

Building

(Chapter 2)

Model

Selection

(Chapter 3)Large Scale

Data Sets

(Chapter 4)

Robustness

(Chapter 5)

Correlated

Errors

(Chapter 6)

Confidence

Intervals

(Chapter 7)

Applications

amp

Case Studies

(Chapter 8)

Conclusions

amp

Further

Research

(Chapter 9)

539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Overview

Least Squares

Support Vector

Machines

Introduction

(Chapter 1)

Model

Building

(Chapter 2)

Model

Selection

(Chapter 3)Large Scale

Data Sets

(Chapter 4)

Robustness

(Chapter 5)

Correlated

Errors

(Chapter 6)

Confidence

Intervals

(Chapter 7)

Applications

amp

Case Studies

(Chapter 8)

Conclusions

amp

Further

Research

(Chapter 9)

539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

A simple example

minus5 0 5minus30

minus20

minus10

0

10

20

30

Y = aX + b

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

Y =

PARAMETRIC FORM IS NOT ALWAYS EASY TO FIND

739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Construction of a nonparametric estimate NW smoother

0 02 04 06 08 1

minus1

minus05

0

05

1

15

x

mn(x)

mn(x) =nsum

i=1

K(xminusXih )

sumnj=1K(

xminusXjh )

Yi

839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Other nonparametric regression estimates

Local constant regression (Nadaraya 1964 Watson 1964)

Regression trees (Breiman et al 1984)

Wavelets (Daubechies 1992)

Nearest Neighbors (Devroye et al 1994)

Local linear regression (Fan amp Gijbels 1996)

Support vector machines (Vapnik 1995)

Splines (Wahba 1990 Eubank 1999)

Partitioning estimates (Gyorfi et al 2002)

Least squares support vector machines (Suykens et al 2002)

939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Least squares support vector machines

Primal formulation (LS-SVM formulation for regression)

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

⋆⋆

⋆⋆

⋆⋆

input space feature space

⋆⋆

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ( )

ϕ(middot)

1039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVM solution + model selection

Dn = (Xk Yk) Xk isin Rd Yk isin R k = 1 n

iidsim (XY )

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Dual formulation(

0 1Tn

1n Ω+ Inγ

)

(

b

α

)

=

(

0

Y

)

Ωkl = ϕ(Xk)Tϕ(Xl) = K(XkXl) = (2π)minusd2 exp(minusXkminusXl2

2h2 )

K has to be positive definite ieint

exp(minusjωx)K(x) dx ge 0

Model in dual space mn(x) =sumn

k=1 αkK(xXk) + b

γ and h tuning parameters rArr cross-validation1139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Effect of the tuning parameters

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too small

0 10 20 30 40 50 60minus150

minus100

minus50

0

50

100

h too large

Model selection criteria are ABSOLUTELY needed

1239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Estimation in Primal Space

LS-SVM formulation for regression

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wTϕ(Xk) + b+ ek = Yk k = 1 n

Can we solve the LS-SVM in primal space instead of dual

Approximation of feature map ϕ needed

Is it possible to compute such a mapping

ϕ can be infinite dimensional

Solutionuse a fixed size m of support vectors to approximate ϕ

Solve the above as primal ridge regression

1439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with Large Scale Data

1 Calculation andor storage kernel matrix Ω

N = 1000 rArr Ω rArr 8 MBN = 10000 rArr Ω rArr 763 MBN = 20000 rArr Ω rArr 3051 MB

2 If possible to compute how long would it take

rArr Solution Matrix Approximations (Nystrom 1930)

ϕi(x)m≪nasymp

radicm

λ(m)i

summk=1K(Xk x)u

(m)ki

1539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Fixed Size LS-SVM formulation

Given approximation to the feature map

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st wT ϕ(Xk) + b+ ek = Yk k = 1 n

Solution(

wb

)

=(

ΦTe Φe +

Im+1

γ

)minus1ΦTe Y

with

Φe =

ϕ1(X1) middot middot middot ϕm(X1) 1

ϕ1(Xn) middot middot middot ϕm(Xn) 1

1639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Selection of support vectors Renyi Entropy

Maximize quadratic Renyi entropy HmR2 = minus log

int

f(x)2 dx

Theorem (Maximizing Entropy)

The Renyi entropy on a closed interval [a b] with a b isin R and no

additional moment constraints is maximized for the uniform

density 1(bminus a)

(Selecting SV) (Wrong Bandwidth)

1739

svavi
Media File (videoavi)
wbandavi
Media File (videoavi)

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Identification of a pilot scale distillation column

Joint work with Bart Huyck (CIT)

Task identify bottom temperature column with LS-SVM

0 1000 2000 3000 4000 5000 6000 7000 800077

775

78

785

79

795

Tem

p

Time (s) 1839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions1939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in parametric regression

model Yk = aXk + b+ ek k = 1 n

(a b) estimated from data

LS principle (a b) = argminabisinR2

1n

sumnk=1 [Yk minus (aXk + b)]2

0 1 2 3 4 5minus10

minus5

0

5

10

15

20

25

30

35

40

X

Y

LS principle is NOT robust

Solution LAD (a b) = argminabisinR2

1n

sumnk=1 |Yk minus (aXk + b)|

2039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

f(X)

X

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Problems with outliers in nonparametric regression

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X)

LS principle rArr sensitive to outliers (and leverage points)

Linearpolynomial kernel rArr non-robust methods

Using appropriate CV rArr robust CV

2139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 e

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Iterative Reweighting amp Weight Functions

Primal formulation

minwbe

JP (w e) =12w

Tw + γ2

sumnk=1 vke

2k

st Yk = wTϕ(Xk) + b+ ek k = 1 n

Huber Hampel Logistic Myriad

V (r)

1 if |r| lt ββ|r| if |r| ge β

1 if |r| lt b1 b2minus|r|b2minusb1

if b1 le |r| le b2

0 if |r| gt b2

tanh(r)r

δ2

δ2+r2

ψ(r)

L(r)

r2 if |r| lt β

β|r| minus 1

2β2 if |r| ge β

r2 if |r| lt b1b2r2minus|r3|

b2minusb1 if b1 le |r| le b2

0 if |r| gt b2

r tanh(r) log(δ2+r2)

2239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Properties of the Myriad

if δ rarr infin =rArr Myriad converges to sample mean

if δ rarr 0 =rArr Myriad converges to the sample mode

X1 X2X3X4 X5

δ = 075

δ = 2

δ = 20

2339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

To obtain a fully robust solution

robust smootherbounded kernel

robust CV rArr Lprime bounded

RCV (θ) =1

n

nsum

i=1

L(

Yi minus m(minusi)n (Xi θ)

)

0 02 04 06 08 1minus15

minus1

minus05

0

05

1

15

2

25

outliers

X

f(X

)

0 02 04 06 08 1

minus2

minus1

0

1

2

X

f(X

)

2439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions2539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

We have a problem

0 02 04 06 08 1minus2

minus1

0

1

2

3

4

5

6

x

m(x)m

n(x)

VIOLATION OF IID ASSUMPTION2639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What went wrong

Yi = m(xi) + ei model selection =rArr Cov[ei ej ] = 0

In previous example Cov[ei ej ] 6= 0

correlation (covariance) Strength of relationship between ei and ej

Example 1

Suppose there are two technology stocks If they are affected bythe same industry trends their prices will tend to rise or falltogether

Example 2

Housing prices Sensitive to offer and demand

Example 3

In our toy example ei+1 was affected by the value of ei and soon

2739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Removing correlation effects main theorem

Breakdown bandwidth selection procedures smoother stays consistent

Theorem

Assume x equiv in x isin [0 1] E[e] = 0 Cov[ei ei+k] = E[eiei+k] = γk and γk sim kminusa

for some a gt 2 Assume that Yi = m(xi) + ei and

(C1) K is Lipschitz continuous at x = 0

(C2)intK(u) du = 1 lim|u|rarrinfin |uK(u)| = 0

int|K(u)| du lt infin supu |K(u)| lt infin

(C3)int|k(u)| du lt infin and K is symmetric

Further assume that boundary effects are ignored and that h rarr 0 as n rarr infin such

that nh2 rarr infin then for the NW smoother it follows that

E[CV(h)]=1

nE

nsum

i=1

[

m(xi)minusm(minusi)n (xi)

]2

+σ2minus

4K(0)

nhminusK(0)

infinsum

k=1

γk+o(nminus1hminus1)

No prior knowledge about correlation structure needed

2839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Suitable kernels amp drawback

minus08 minus06 minus04 minus02 0 02 04 06 080

05

1

15

2

25

minus3 minus2 minus1 0 1 2 30

005

01

015

02

025

03

035

04

045

minus5 0 50

002

004

006

008

01

012

014

016

018

02

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

0 02 04 06 08 1minus1

0

1

2

3

4

5

6

rArrrArr Decreased Mean Squared Error rArrrArr2939

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Some real life examples

Beveridge index of wheat prices

1500 1550 1600 1650 1700 1750 1800 1850 19002

25

3

35

4

45

5

55

6

Yearlog(B

everidge

wheatprices)

US monthly birth rate

1940 1941 1942 1943 1944 1945 1946 1947 19481800

2000

2200

2400

2600

2800

3000

Year

Birth

rate

3039

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3139

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimates

Can we say something about the true function m given mn

We want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level α

Pointwise vs simultaneousuniform confidence intervals

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

What are confidence intervals

How accurate are our nonparametric estimatesCan we say something about the true function m given mnWe want something of the formLn(x) le m(x) le Un(x) forallx for some confidence level αPointwise vs simultaneousuniform confidence intervals

minus1 minus05 0 05 1minus05

0

05

1

15

2

25

X

Ym

n(X

)

3239

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

In practice

In general

These intervals give the user the ability to see how well a certainmodel explains the true underlying process while taking statisticalproperties of the estimator into account

Fault detection

In fault detection CI are used for reducing the number of falsealarms

3339

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Outline

1 Goal amp Overview

2 IntroductionParametric vs nonparametric regressionNonparametric regression estimates an overview

3 Fixed-Size Least Squares Support Vector MachinesFixed Size LS-SVM formulationSelection of Support VectorsPractical identification problem

4 Robust Nonparametric MethodsProblems with outliersRobust nonparametric regression

5 Correlated ErrorsProblems with correlation in nonparametric regressionRemoving correlation effects

6 Confidence Intervals

7 Conclusions3439

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Conclusions

Goal of the Thesis

Study the properties of LS-SVM for regression with an emphasison statistical aspects and develop a framework for large scale data

Main Achievements

Framework for large data sets

Method for minimizing model selection criteria score functions

Robustification of kernel based method

Weight function with attractive properties

Framework for correlated errors based on bimodal kernels

Asymptotic normality of linear smoothers

Bias amp variance estimators for LS-SVM

Pointwise amp simultaneous CI + comparison bootstrap

3539

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

LS-SVMLab software

Free available (for research purposes) Matlab toolboxhttpwwwesatkuleuvenacbesistalssvmlab

Userrsquos guide with applications (p 113 De Brabanter et al 2010)

3639

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

Falck T Dreesen P De Brabanter K Pelckmans K De Moor B SuykensJAK Least-Squares Support Vector Machines for the Identification ofWiener-Hammerstein Systems Submitted 2011

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression in the Presence of Correlated Errors Submitted 2011

De Brabanter K Karsmakers P De Brabanter J Suykens JAK De Moor BConfidence Bands for Least Squares Support Vector Machine Classifiers ARegression Approach Submitted 2010

De Brabanter K De Brabanter J Suykens JAK De Moor B ApproximateConfidence and Prediction Intervals for Least Squares Support VectorRegression IEEE Transactions on Neural Networks 22(1)110ndash120 2011

Sahhaf S De Brabanter K Degraeve R Suykens JAK De Moor BGroeseneken G Modelling of Charge TrappingDe-trapping Induced VoltageInstability in High-k Gate Dielectrics Submitted 2010

Karsmakers P Pelckmans K De Brabanter K Van Hamme H SuykensJAK Sparse Conjugate Directions Pursuit with Application to Fixed-sizeKernel Models Submitted 2010

Sahhaf S Degraeve R Cho M De Brabanter K Roussel PhJ Zahid MBGroeseneken G Detailed Analysis of Charge Pumping and Id minus Vg Hysteresis forProfiling Traps in SiO2HfSiO(N) Microelectronic Engineering87(12)2614ndash2619 2010

3739

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B OptimizedFixed-Size Kernel Models for Large Data Sets Computational Statistics amp DataAnalysis 54(6)1484ndash1504 2010

Lopez J De Brabanter K Dorronsoro JR Suykens JAK Sparse LS-SVMswith L0-Norm Minimization Accepted for publication in Proc of the 19thEuropean Symposium on Artificial Neural Networks Computational Intelligenceand Machine Learning (ESANN) Brugge (Belgium) 2011

Huyck B De Brabanter K Logist F De Brabanter J Van Impe J De MoorB Identification of a Pilot Scale Distillation Column A Kernel Based ApproachAccepted for publication in 18th World Congress of the International Federationof Automatic Control (IFAC) 2011

De Brabanter K Karsmakers P De Brabanter J Pelckmans K SuykensJAK De Moor B On Robustness in Kernel Based Regression NIPS 2010Robust Statistical Learning (ROBUSTML) (NIPS 2010) Whistler CanadaDecember 2010

De Brabanter K Sahhaf S Karsmakers P De Brabanter J Suykens JAKDe Moor B Nonparametric Comparison of Densities Based on StatisticalBootstrap in Proc of the Fourth European Conference on the Use of ModernInformation and Communication Technologies (ECUMICT) Gent BelgiumMarch 2010 pp 179ndash190

3839

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions

Goal amp Overview Introduction FS-LSSVM Robust Nonparametric Methods Correlated Errors Confidence Intervals Conclusions

Publications

De Brabanter K De Brabanter J Suykens JAK De Moor B KernelRegression with Correlated Errors in Proc of the the 11th InternationalSymposium on Computer Applications in Biotechnology (CAB) LeuvenBelgium July 2010 pp 13ndash18

De Brabanter K Pelckmans K De Brabanter J Debruyne M SuykensJAK Hubert M De Moor B Robustness of Kernel Based Regression aComparison of Iterative Weighting Schemes in Proc of the 19th InternationalConference on Artificial Neural Networks (ICANN) Limassol Cyprus September2009 pp 100ndash110

De Brabanter K Dreesen P Karsmakers P Pelckmans K De Brabanter JSuykens JAK De Moor B Fixed-Size LS-SVM Applied to theWiener-Hammerstein Benchmark in Proc of the 15th IFAC Symposium onSystem Identification (SYSID 2009) Saint-Malo France July 2009 pp826ndash831

De Brabanter K Karsmakers P Ojeda F Alzate C De Brabanter JPelckmans K De Moor B Vandewalle J Suykens JAK LS-SVMlab ToolboxUserrsquos Guide version 17rdquo Internal Report 10-146 ESAT-SISTA KULeuven(Leuven Belgium) 2010

3939

  • Goal amp Overview
  • Introduction
    • Parametric vs nonparametric regression
    • Nonparametric regression estimates an overview
      • Fixed-Size Least Squares Support Vector Machines
        • Fixed Size LS-SVM formulation
        • Selection of Support Vectors
        • Practical identification problem
          • Robust Nonparametric Methods
            • Problems with outliers
            • Robust nonparametric regression
              • Correlated Errors
                • Problems with correlation in nonparametric regression
                • Removing correlation effects
                  • Confidence Intervals
                  • Conclusions