optmization methods for machine learning radial basis function · consider other approximation...

28
Interpolation RBF Regularized RBF Generalized RBF XOR problem Optmization Methods for Machine Learning Radial Basis function Laura Palagi http://www.dis.uniroma1.it/palagi Dipartimento di Ingegneria informatica automatica e gestionale A. Ruberti Sapienza Universit ` a di Roma Via Ariosto 25 RBF Networks L. Palagi

Upload: others

Post on 20-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Optmization Methods for Machine LearningRadial Basis function

Laura Palagihttp://www.dis.uniroma1.it/∼palagi

Dipartimento di Ingegneria informatica automatica e gestionale A. RubertiSapienza Universita di Roma

Via Ariosto 25

RBF Networks L. Palagi

Page 2: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Interpolation problem

Given p distinct points in Rn:

X = {x i ∈ Rn, i = 1, . . . ,p},

and a corresponding set of real numbers

Y = {y i ∈ R, i = 1, . . . ,p}.

The interpolation problem consists in finding a functionf : Rn → R, in a given class of real functions F , which satisfies:

f (x i) = y i i = 1, . . . ,P. (1)

RBF Networks L. Palagi

Page 3: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Interpolation propertiesFor n = 1 the Interpolation pb. can be solved explicitly usingpolynomials

f (x) =P−1∑i=0

ci t i

For n > 1, the 2-layer MLP with g not polynomial satisfies

P∑j=1

v jg(w j T x i − bj) = y i , i = 1, . . . ,P

for some w j ∈ Rn, and v j ,bj ∈ R.

MLP can approximate arbitrarily well a continuous functionprovided that an arbitrarily large number of units is available.

RBF Networks L. Palagi

Page 4: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Interpolation properties

Being an universal approximator may be not enough fromtheoretical point of vew. An important property is the

existence of a best approximation

Informally: given a function f belonging to some set of functionsF and given a subset A of F find an element of A which isclosest to f . If d(f ,g) is the distance between two elements f ,gin F , we consider the problem

d∗A = infa∈A

d(f ,a)

If there exists a∗ ∈ A that attains the infimum, namelyd∗A = d(f ,a∗) then a∗ is the best approximation to f from A.

RBF Networks L. Palagi

Page 5: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Best approximation properties

MLP does not have the best approximation property.

Consider other approximation scheme based on Radial Basisfunctions (RBF)

φ(‖x − x j‖)

with j = 1, . . .P.φ : R+ → R is a suitable continuous function, called radial basisfunction since it is assumed that the argument of φ is the radiusr = ‖x − x j‖.

RBF Networks L. Palagi

Page 6: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Gaussian

φ(r) = e−(r/σ)2

with r > 0

RBF Networks L. Palagi

Page 7: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Multiquadric

φ(r) = (r2 + σ2)1/2

RBF Networks L. Palagi

Page 8: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Inverse Multiquadric

φ(r) = (r2 + σ2)−1/2

RBF Networks L. Palagi

Page 9: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Other RBF

φ(r) = r linear splineφ(r) = r3 cubic splineφ(r) = r2 log r , thin plate spline.

RBF Networks L. Palagi

Page 10: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Interpolation by RBF

Given p distinct points in Rn:

X = {x i ∈ Rn, i = 1, . . . ,P},

and consider functions of the form

f (x) =P∑

j=1

wjφ(‖x − x j‖), (2)

where the data points x j ∈ X are the so called centers and thecoefficients wj ∈ R are the weights.

RBF Networks L. Palagi

Page 11: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Interpolation by RBFBy imposing the interpolation conditions we get:

P∑j=1

wjφ(‖x i − x j‖) = y i , i = 1, . . . ,P. (3)

It is a linear system of P equations in P unknowns. Let definethe vectors w =

(w1 · · · wP

)T, and y =

(y1 · · · yP

)T,

and the symmetric P × P matrix Φ with elements

Φi,j = φ(‖x i − x j‖), 1 ≤ i , j ≤ P,

system (3) cam be written as:

Φw = y .

RBF Networks L. Palagi

Page 12: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Matrix Φ is non singular, provided that P ≥ 2, that theinterpolation points x j , j = 1, . . . ,P are distinct and using

I Gaussian (Φ positive definite)I the multiquadricI the inverse multiquadric (Φ positive definite)I linear spline

Thus, the interpolation problem Φw = y admits a uniquesolution. When φ pos. def. it can be computed by minimizingthe (strictly) convex quadratic function in RP

F (w) =12

wT Φw − yT w ,

whose gradient is given by ∇F (w) = Φw − y .

RBF Networks L. Palagi

Page 13: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

From Interpolation to approximation properties

Because of the remarkable properties of the RBFs, the RBFmethod is one of the most often applied approaches inmultivariable interpolation.

This has motivated the attempt of employing RBFs also withinapproximation algorithms for the solution of classification andregression problems in data mining.

RBF Networks L. Palagi

Page 14: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Regularized RBF neural networks

Suppose that the set {(xp, yp), p = 1, . . . ,P} of data has beenobtained by random sampling of a function belonging to somespace of functions X in the presence of noise

This problem of recovering the function or an estimate of it fromthe set of data is clearly ill posed since it has an infinite numberof solutions.

In order to choose one particular solution we need to havesome a priori knowledge of the function that has to bereconstructed.

The most common form of a priori knowledge consists inassuming that the function is smooth in the sense that twosimilar inputs correspond to two similar outputs.

RBF Networks L. Palagi

Page 15: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Regularized RBF neural networksThe solution can be obtained from a variational principle whichcontains both the data and smoothness information.

Smoothness is a measure of the ”oscillatory” behavior of f .Within a class of differentiable functions, one function is said tobe smoother than another one if it oscillates less. Asmoothness functional E2(f ) is defined and we consider

minfE(f ) = E1(f ) + λE2(f ) =

12

P∑i=1

[y i − f (x i)]2 + λE2(f ),

where the first term is enforcing closeness to the data and thesecond smoothness while the regularization parameter λ > 0controls the tradeoff be tween these two terms.

RBF Networks L. Palagi

Page 16: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Regularized RBF neural networksIt can be shown that for a wide class of smoothness functionalsE2(f ), the solutions of the minimization all have the same form

P∑i=1

wiφ(‖x − c i‖) = y ,

Centers coincides with inputs

c i = x i , i = 1, . . . ,P

and weights solve the regularized system

(Φ + λI)w = y

whereΦ = {Φij}i,j=1,...,P = {φ(‖x i − x j‖)}i,j=1,...,P

RBF Networks L. Palagi

Page 17: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

2-layer Regularized RBF network

y(x)

-x t����

�������3

�����

������:

QQQQQQQQQQQs

φ(‖x − x1‖)

φ(‖x − x2‖)

φ(‖x − xP‖)

w1

w2

wP

XXXXXXXXXXz

����������3

QQQQQQQQQQs n+ -uu

RBF Networks L. Palagi

Page 18: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

2-layer Regularized RBF network

I RBF are universal approximator: any continuous functioncan be approximated arbitrarily well on a compact set,provided a sufficiently large number of units, and for anappropriate choice of the parameters

I RBF possess the best approximation property, namelythere exists the best approximation and in most cases(under assumptions often satisfied) is unique (RBF islinear in parameters w)

I The value of λ can be selected by employing crossvalidation techniques and this may require that system(Φ + λI)w = y is solved several times.

RBF Networks L. Palagi

Page 19: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

2-layer Generalized RBF networkWhen P is very large, the cost of constructing a regularizedRBF network can be prohibitive. Indeed, the computation of theweights w ∈ RP requires the solution of a possible illconditioned linear system, which costs O(P3).

Generalized RBF neural network are used where the number Nof neural units is much less than P.

The output of the network can be defined by

y(x) =N∑

j=1

wjφj(‖x − cj‖), (4)

where both the centers cj ∈ Rn and the weights wj j = 1, . . . ,Nmust be selected appropriately.

RBF Networks L. Palagi

Page 20: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

2-layer Generalized RBF network

y(x)

-x t����

�������3

�����

������:

QQQQQQQQQQQs

φ(‖x − c1‖)

φ(‖x − c2‖)

φ(‖x − cN‖)

w1

w2

wN

XXXXXXXXXXz

����������3

QQQQQQQQQQs n+ -uu

RBF Networks L. Palagi

Page 21: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

2-layer Generalized RBF networkI GRBF are universal approximator: any continuous function

can be approximated arbitrarily well on a compact set,provided a sufficiently large number of units, and for anappropriate choice of the parameters

I GRBF may NOT possess the best approximation property.However if the centers are fixed, the approximationproblem becomes linear with respect to w and theexistence of a best approximation is guaranteed

I in he general case, both the centers and the weights aretreated as variable parameters and the approximation isnonlinear

I As N << P, GRBF performs inherently a structuralstabilization which may prevent the occurrence ofovertraining.

RBF Networks L. Palagi

Page 22: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

An example: Exclusive OR

The logical function XOR

XORp x1 x2 yp

1 -1 -1 -12 -1 1 13 1 -1 14 1 1 -1

ccs

s1

2

3

4-

6x2

x1

Perceptron (linear separator) doesn’t work

RBF Networks L. Palagi

Page 23: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Two layer MLP

w

w

-

-

x2

-��������7x1 S

SSSSSSSw

-

sign(·)

sign(·)

w22

w12

w21

w11

b1

b2

v1

v2i+a2-

6

i+a1-

6QQQQQQs

������3

i+6

-sign(·)

-

wb3y(x)

w

w

RBF Networks L. Palagi

Page 24: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Two layer MLP

Choose w11 = w22 = 1 and w12 = w21 = −1, b1 = b2 = −1,v1 = v2 = 1 b3 = 0.1 (output bias). We get

a1 = x1 − x2 − 1 z1 = sign(a1)a2 = −x1 + x2 − 1 z2 = sign(a2)y = sign(z1 + z2 + 0.1)

input p a1 a2 z1 z2 z1 + z2 + 0.1 y1 -1 -1 -1 -1 -1.9 -12 -3 1 -1 1 0.1 13 1 -3 1 -1 0.1 14 -1 -1 -1 -1 -1.9 -1

RBF Networks L. Palagi

Page 25: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

Two layer MLP

This MLP network with two hidden nodes realizes a nonlinearseparation (each hidden node describes one of the two lines).The output node combines the outputs of the two hidden layer.

eeu

u1

2

3

4

-

@@@@@@@

@@@@@@@

6x2

x1

RBF Networks L. Palagi

Page 26: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

RBF network

Consider a RBF network with two units (N = 2) with centersc1, c2 and assume the activation function is a gaussiangj = e−(‖x−cj‖/σ)2

w

w

-

-

x2

-��������7x1S

SSSSSSSw

-

w1

w2

z2 = e−‖x−c2‖

2

σ2������3

z1 = e−‖x−c1‖

2

σ2 QQQQQQs i+6

-sign(·)

-

wb y(x)

RBF Networks L. Palagi

Page 27: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

RBF network

Choose σ =√

2 and c1 =

(11

)c2 =

(−1−1

)We transform

the problem into a linearly separable form.

XOR

p e−‖x−c1‖

2

σ2 e−‖x−c1‖

2

σ2 yp

1 e−4 1 -12 e−2 e−2 13 e−2 e−2 14 1 e−4 -1

fv

v1

423

@@@@@@@@@

-

6z2

z1

RBF Networks L. Palagi

Page 28: Optmization Methods for Machine Learning Radial Basis function · Consider other approximation scheme based on Radial Basis functions (RBF) ˚(kx xjk) with j = 1;:::P. ˚: R+!R is

Interpolation RBF Regularized RBF Generalized RBF XOR problem

The output takes the form

y(x) = w1e−‖x−c1‖

2

σ2 + w2e−‖x−c2‖

2

σ2 + b

Minimizing the training error

minw ,b

4∑p=1

(y(xp)− yp)2

we get the optimal solution (w∗,b∗) that gives E = 0 w1w2b

=

−2,675065656−2,675065656

1,72406123

and the RBF network has been trained.

RBF Networks L. Palagi