radial basis function (rbf) networks - michael i mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · radial...

42
1 CSE 5526: RBF Networks CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

Upload: others

Post on 07-Sep-2019

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

1 CSE 5526: RBF Networks

CSE 5526: Introduction to Neural Networks

Radial Basis Function (RBF) Networks

Page 2: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

2 CSE 5526: RBF Networks

Function approximation

• We have been using MLPs as pattern classifiers • But in general, they are function approximators

• Depending on output layer nonlinearity • And error function being minimized

• As a function approximator, MLPs are nonlinear, semiparametric, and universal

Page 3: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

3 CSE 5526: RBF Networks

Function approximation

• Radial basis function (RBF) networks are similar function approximators

• Also nonlinear, semiparametric, universal • Can also be visualized as layered network of nodes • Easier to train than MLPs

• Do not require backpropagation • But do not necessarily find an optimal solution

Page 4: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

4 CSE 5526: RBF Networks

RBF net illustration

Page 5: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

5 CSE 5526: RBF Networks

Function approximation background

• Before getting into RBF networks, let’s discuss approximating scalar functions of a single variable

• Weierstrass approximation theorem: any continuous real function in an interval can be approximated arbitrarily well by a set of polynomials

• Taylor expansion approximates any differentiable function by a polynomial in the neighborhood around a point

• Fourier series gives a way of approximating any periodic function by a sum of sines and cosines

Page 6: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

6 CSE 5526: RBF Networks

Linear projection

• Approximate function f (x) by a linear combination of simpler functions

• If wj’s can be chosen so that approximation error is

arbitrarily small for any function f (x) over the domain of interest, then {𝜑𝜑𝑗𝑗} has the property of universal approximation, or {𝜑𝜑𝑗𝑗} is complete

𝐹𝐹 𝐱𝐱 = �𝑤𝑤𝑗𝑗𝜑𝜑𝑗𝑗(𝐱𝐱)𝑗𝑗

Page 7: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

7 CSE 5526: RBF Networks

Example incomplete basis: sinc

sinc(𝑥𝑥) = sin(𝜋𝜋𝑥𝑥)𝜋𝜋𝑥𝑥

𝜑𝜑𝑗𝑗 𝑥𝑥 = sinc(𝑥𝑥 − 𝜇𝜇𝑗𝑗)

• Can approximate any smooth function

Page 8: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

8 CSE 5526: RBF Networks

Example orthogonal complete basis: sinusoids

𝜑𝜑2𝑛𝑛 𝑥𝑥 = sin 2𝜋𝜋𝜋𝜋𝜋𝜋𝑥𝑥 𝜑𝜑2𝑛𝑛+1 𝑥𝑥 = cos 2𝜋𝜋𝜋𝜋𝜋𝜋𝑥𝑥

Complete on the interval [0,1]

Page 9: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

9 CSE 5526: RBF Networks

Example orthogonal complete basis: Chebyshev polynomials

𝑇𝑇0 𝑥𝑥 = 1 𝑇𝑇1 𝑥𝑥 = 𝑥𝑥 𝑇𝑇𝑛𝑛+1 𝑥𝑥 = 2𝑥𝑥𝑇𝑇𝑛𝑛 𝑥𝑥 − 𝑇𝑇𝑛𝑛−1 𝑥𝑥 Complete on the interval [0,1]

𝑇𝑇2 𝑥𝑥 = 2𝑥𝑥2 − 1 𝑇𝑇3 𝑥𝑥 = 4𝑥𝑥3 − 3𝑥𝑥

etc.

"Chebyshev Polynomials of the 1st Kind (n=0-5, x=(-1,1))" by Inductiveload - Own work. Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Chebyshev_Polynomials_of_the_1st_Kind_(n%3D0-5,_x%3D(-1,1)).svg

Page 10: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

10 CSE 5526: RBF Networks

Radial basis functions

• A radial basis function (RBF) is a basis function of the form 𝜑𝜑𝑗𝑗 𝒙𝒙 = 𝜑𝜑 𝒙𝒙 − 𝝁𝝁𝑗𝑗 • Where 𝜑𝜑 𝑟𝑟 is positive w/monotonic derivative for 𝑟𝑟 > 0

• Consider a Gaussian RBF

𝜑𝜑𝑗𝑗 𝐱𝐱 = exp −1

2𝜎𝜎2𝐱𝐱 − 𝐱𝐱𝑗𝑗

2 = 𝐺𝐺 𝒙𝒙 − 𝒙𝒙𝑗𝑗

• A local basis function, falling off from the center

G(||x-xj||)

xj x

1

Page 11: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

11 CSE 5526: RBF Networks

Radial basis functions (cont.)

• Thus approximation by Gaussian RBF becomes

𝐹𝐹(𝐱𝐱)=�𝑤𝑤𝑗𝑗G(||𝐱𝐱−𝐱𝐱𝑗𝑗||)𝑗𝑗

• Gaussians are universal approximators • I.e., they form a complete basis

Page 12: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

12 CSE 5526: RBF Networks

RBF net illustration

Page 13: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

13 CSE 5526: RBF Networks

Remarks (cont.)

• Other RBFs exist, but we won’t be using them • Multiquadrics

𝜑𝜑 𝑥𝑥 = 𝑥𝑥2 + 𝑐𝑐2 • Inverse multiquadrics

𝜑𝜑 𝑥𝑥 =1

𝑥𝑥2 + 𝑐𝑐2

• Micchelli’s theorem (1986) • Let {𝒙𝒙𝑖𝑖} be a set of 𝑁𝑁 distinct points, 𝜑𝜑(⋅) be an RBF • Then the matrix 𝜙𝜙𝑖𝑖𝑗𝑗 = 𝜑𝜑 𝒙𝒙𝑖𝑖 − 𝒙𝒙𝑗𝑗 is non-singular

Page 14: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

14 CSE 5526: RBF Networks

Four questions to answer for RBF nets

• If we want to use Gaussian RBFs to approximate a function specified by training data 1. How do we choose the Gaussian centers? 2. How do we determine the Gaussian widths? 3. How do we determine the weights 𝑤𝑤𝑗𝑗? 4. How do we select the number of bases?

Page 15: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

15 CSE 5526: RBF Networks

1. How do we choose the Gaussian centers?

• Easy way: select K data points at random • Potentially better way: unsupervised clustering, e.g.

using the K-means algorithm

Page 16: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

16 CSE 5526: RBF Networks

K-means algorithm

• Goal: Divide N input patterns into K clusters with minimum total variance

• In other words, partition patterns into K clusters 𝐶𝐶𝑗𝑗 to minimize the following cost function

𝐽𝐽=�� ||𝐱𝐱𝑖𝑖−𝐮𝐮𝑗𝑗||2𝑖𝑖∈𝐶𝐶𝑗𝑗

𝐾𝐾

𝑗𝑗=1

where 𝐮𝐮𝑗𝑗 = 1||𝐶𝐶𝑗𝑗||

∑ 𝐱𝐱𝑖𝑖𝑖𝑖∈𝐶𝐶𝑗𝑗 is the mean (center) of cluster j

Page 17: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

17 CSE 5526: RBF Networks

K-means algorithm

1. Choose a set of K cluster centers randomly from the input patterns

2. Assign the N input patterns to the K clusters using the squared Euclidean distance rule: 𝐱𝐱 is assigned to 𝐶𝐶𝑗𝑗 if ||𝐱𝐱−𝐮𝐮𝑗𝑗||2 ≤ ||𝐱𝐱−𝐮𝐮𝑖𝑖||2 for all 𝑖𝑖 ≠ 𝑗𝑗

3. Update cluster centers

𝐮𝐮𝑗𝑗 =1

|𝐶𝐶𝑗𝑗|�𝐱𝐱𝑖𝑖𝑖𝑖∈𝐶𝐶𝑗𝑗

4. If any cluster center changes, go to step 2; else stop

Page 18: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

18 CSE 5526: RBF Networks

K-means illustration

From Bishop (2006)

Page 19: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

19 CSE 5526: RBF Networks

K-means illustration

From Bishop (2006)

Page 20: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

20 CSE 5526: RBF Networks

K-means illustration

From Bishop (2006)

Page 21: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

21 CSE 5526: RBF Networks

K-means illustration

From Bishop (2006)

Page 22: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

22 CSE 5526: RBF Networks

K-means illustration

From Bishop (2006)

Page 23: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

23 CSE 5526: RBF Networks

K-means illustration

From Bishop (2006)

Page 24: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

24 CSE 5526: RBF Networks

K-means illustration

From Bishop (2006)

Page 25: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

25 CSE 5526: RBF Networks

K-means illustration

From Bishop (2006)

Page 26: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

26 CSE 5526: RBF Networks

K-means illustration

From Bishop (2006)

Page 27: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

27 CSE 5526: RBF Networks

K-means cost function

From Bishop (2006)

Page 28: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

28 CSE 5526: RBF Networks

K-means algorithm remarks

• The K-means algorithm always converges, but only to a local minimum

Page 29: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

29 CSE 5526: RBF Networks

2. How to determine the Gaussian widths?

• Once cluster centers are determined, the variance within each cluster can be set to

σ 𝑗𝑗2 =1

|𝐶𝐶𝑗𝑗|� ||𝐮𝐮𝑗𝑗 − 𝐱𝐱𝑖𝑖||2𝑖𝑖∈𝐶𝐶𝑗𝑗

• Remark: to simplify the RBF net design, all clusters can

assume the same Gaussian width:

σ=𝑑𝑑max2𝐾𝐾

where dmax is the maximum distance between the K cluster centers

Page 30: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

30 CSE 5526: RBF Networks

3. How do we determine the weights 𝑤𝑤𝑗𝑗?

• With the hidden layer decided, weight training can be treated as a linear regression problem

𝚽𝚽𝒘𝒘 = 𝒅𝒅 • Can solve using the LMS algorithm • The textbook discusses recursive least squares

(RLS) solutions • Can also solve in one shot using the pseudo-inverse

𝒘𝒘 = 𝚽𝚽+𝒅𝒅 = 𝚽𝚽T𝚽𝚽 −1𝚽𝚽𝑇𝑇𝒅𝒅 • Note that a bias term needs to be included in 𝚽𝚽

Page 31: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

31 CSE 5526: RBF Networks

4. How do we select the number of bases?

• The same problem as that of selecting the size of an MLP for classification

• The short answer: (cross-)validation • The long answer: by balancing bias and variance

Page 32: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

32 CSE 5526: RBF Networks

Bias and variance

• Bias: training error • Difference between desired output and actual output for a

particular training sample • Variance: generalization error

• difference between the learned function from a particular training sample and the function derived from all training samples

• Two extreme cases: zero bias and zero variance • A good-sized model is one where both bias and

variance are low

Page 33: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

33 CSE 5526: RBF Networks

RBF net training summary

• To train 1. Choose the Gaussian centers using K-means, etc. 2. Determine the Gaussian widths as the variance of each

cluster, or using 𝑑𝑑max 3. Determine the weights 𝑤𝑤𝑗𝑗 using linear regression

• Select the number of bases using (cross-)validation

Page 34: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

34 CSE 5526: RBF Networks

RBF net illustration

Page 35: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

35 CSE 5526: RBF Networks

Comparison between RBF net and MLP

• For RBF nets, bases are local, while for MLP, “bases” are global

• Generally, more bases are needed for an RBF net than hidden units for an MLP

• Training is more efficient for RBF nets

Page 36: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

36 CSE 5526: RBF Networks

XOR problem, again

• RBF nets can also be applied to pattern classification problems • XOR problem

revisited

Let

𝜑𝜑1 𝑥𝑥 = exp − 𝑥𝑥 − 𝑡𝑡1 2 𝜑𝜑2 𝑥𝑥 = exp(− 𝑥𝑥 − 𝑡𝑡2 2)

Where

𝑡𝑡1 = 1,1 𝑇𝑇 𝑡𝑡2 = 0,0 𝑇𝑇

Page 37: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

37 CSE 5526: RBF Networks

XOR problem (cont.)

Page 38: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

38 CSE 5526: RBF Networks

XOR problem, again

• RBF nets can also be applied to pattern classification problems • XOR problem

revisited

Page 39: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

39 CSE 5526: RBF Networks

RBF net on double moon data, d = -5

Page 40: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

40 CSE 5526: RBF Networks

RBF net on double moon data, d = -5

Page 41: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

41 CSE 5526: RBF Networks

RBF net on double moon data, d = -6

Page 42: Radial Basis Function (RBF) Networks - Michael I Mandelmr-pc.org/t/cse5526/pdf/04-rbf.pdf · Radial basis function (RBF) networks are similar function approximators • Also nonlinear,

42 CSE 5526: RBF Networks

RBF net on double moon data, d = -6