radial basis function ann, an alternative to back propagation, uses clustering of examples in the...
TRANSCRIPT
![Page 1: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/1.jpg)
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set
![Page 2: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/2.jpg)
Example of Radial Basis Function (RBF) network
Input vectord dimensions
K radial basis functions
Single output
Structure used for multivariate regressionsor binary classification
![Page 3: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/3.jpg)
Review: RBF network provides alternative to back propagationEach hidden node is associated with a cluster of input instances Hidden layer connected to the output by linear least squares
Gaussians are the most frequently used radial basis functionjj(x) = exp(-½(|x-mj|/sj)2)
Clusters of input instances areparameterized by a mean and variance
![Page 4: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/4.jpg)
Nr
r
r
2
1
)()()()(
)()()()(
)()()()(
r
xxxx
xxxx
xxxx
NK
N3
N2
N1
2K
23
22
21
1K
13
12
11
D
Linear least squares with basis functions
Nt
tt ,r 1}{ xXGiven training set
and the mean and variance of K clusters of input data,construct the NxK matrix D and column vector r.
Add a column of ones to include a bias node.Solve normal equations DTDw = DTr for a vector w of K weights connecting hidden nodes to output node
![Page 5: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/5.jpg)
RBF networks perform best with large datasets
With large datasets, expect redundancy (i.e. multiple examples expressing the same general pattern)
In RBF network, hidden layer is a feature-space representation of the data where redundancy has been used to reduce noise.
A validation set may be helpful to determine K, the best number clusters of input data
![Page 6: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/6.jpg)
6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Supervised learning: mapping input to output
Unsupervised learning: find regularities in the inputregularities reflect some probability distribution of attribute vectors, p(xt)discovering p(xt) called “density estimation”parametric method uses MLE to find q in p(xt|q)
In clustering, we look for regularities as group membershipassume we know the number of clusters, Kgiven K and dataset X, we want to find
the size of each group P(Gi) and itscomponent density p(x|Gi)
Background on clustering
![Page 7: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/7.jpg)
7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Define group labels based on nearest center
Get new trial centers
Find group labels using the geometric interpretation of a cluster as points in attribute space closer to a “center” than they are to data points not in the cluster
Define trial centers by reference vectors mj j = 1…k
otherwise0
min if1 jt
ji
ttib
mxmx
tti
ttt
ii
b
b xm
Judge convergence by t i itt
ikii bE mxm X1
K-Means Clustering: hard labels
![Page 8: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/8.jpg)
K-means clustering pseudo code
8Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
![Page 9: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/9.jpg)
9Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Example of pseudo code application
![Page 10: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/10.jpg)
Example of K-means with arbitrary starting centers and convergence plot
10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Convergence
![Page 11: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/11.jpg)
K-means is an example of the Expectation-Maximization (EM) approach to MLE
t
k
iii
t GPGp1
|log| xXL
11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Log likelihood of mixture modelcannot be solved analytically for F
Use a 2-step iterative method:E-step: estimate labels of xt given current knowledge of mixture componentsM-step: update component knowledge using labels from E-step
![Page 12: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/12.jpg)
12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
E - step
M - step
K-means clustering pseudo code with steps labeled
![Page 13: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/13.jpg)
Given converged K-means centers, estimate variance for RBFs by s2 = d2
max/2K, where dmax is the
largest distance between clusters.
Gaussian mixture theory is another approach to getting RBFs
Application of K-means clustering to RBF-ANN
![Page 14: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/14.jpg)
k
iii
tt GPGpp1
|xx
14
X={xt}t is made up of K groups (clusters)
P(Gi) proportion of X in group i
attributes in each group are Gaussian distributed
p(xt|Gi) = Nd ( μi , ∑i ) mi means of xt in group i
Si covariance matrix of xt in group i
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Distribution of attributes is mixture of Gaussians
Gaussian Mixture Densities
![Page 15: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/15.jpg)
Given a group label for each data point, rit, MLE provides
estimates of parameters of Gaussian mixtures
where p ( x | Gi) ~ N ( μi , ∑i )
Φ = {P (Gi ), μi , ∑i }i=1 to k
15
k
iii
tt GPGpp1
|| xx
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
t
ti
T
it
t itt
ii
t
ti
t
tti
it
ti
i
r
r
r
r
N
r
mxmx
xm
S
Estimators
![Page 16: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/16.jpg)
2
2
2exp
2
1 x-xp
N
mxs
N
xm
t
t
t
t
2
2
p(x) = N ( μ, σ2)
MLE for μ and σ2:
16
μ σ
2
2
22
1
x
xp exp
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
1D Gaussian distribution
![Page 17: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/17.jpg)
μxμxx
μx
1212
Σ2
1
Σ2
1
Σ
Td
d
p exp
~
//
,N
17Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Mahalanobis distance: (x – μ)T ∑–1 (x – μ)analogous to (x-m)2/s2
x - m is column vector dx1S is dxd matrixM-distance is a scalar
Measures distance of x from mean in units of S
d denotes number of attributes
dD Gaussian distribution
![Page 18: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/18.jpg)
• If xi are independent, offdiagonals of ∑ are 0,
• p(x) is product of probabilities for each component of x
d
i i
iid
ii
d
d
iii
xxpp
1
2
1
21 2
1
2
1
exp
/ x
18Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
![Page 19: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/19.jpg)
19
t
ti
Tli
t
t
li
ttil
i
t
ti
t
ttil
it
ti
i
h
h
h
h
N
h
111
1
mxmx
xm
S
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Replace the hard labels, rit , by soft label, hi
t , the probability that xt belongs to cluster i.Assume that cluster densities p(xt|F) are Gaussian, then mixture proportions, means and covariance matrix are estimated by
where hit are soft labels from previous E-step
Gaussian mixture model by EM: soft labels
![Page 20: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/20.jpg)
20Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Initialize by k-means clustering. After a few iterations, use centers mi and instances covered by each center to estimate the covariance matrices Si and mixture proportions pi
From mi , Si , and pi, calculate hit, soft labels by
j )j1-j
T)j
1/2-|j|j
)i-1i
T)i
-1/2|i|it
i](S(5.0exp[Sπ
](S(5.0exp[Sπh
mxmx
mxmxtt
tt
Calculate new proportions, centers and covariance by
Use these to calculate new soft labels
Gaussian mixture model by EM: soft labels
t
ti
Tli
t
t
li
ttil
i
t
ti
t
ttil
it
ti
i
h
h
h
h
N
h
111
1
mxmx
xm
S
![Page 21: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/21.jpg)
21Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
K-meansHard labelsCenters marked
EM Gaussian mixtures with soft labelsContours show 1 standard deviation Colors show mixture proportions
![Page 22: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/22.jpg)
22Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
k-means hard lables
![Page 23: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/23.jpg)
23
P(G1|x)=0.5
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Data points color coded by greater soft labelContours show m + s of Gaussian densitiesDashed contour is “separating” curve
Gaussian mixtures; soft labelsx marks cluster mean
Outliers?
![Page 24: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/24.jpg)
24Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
In applications of Gaussian mixtures to RBFs, correlation of attributes is ignored and diagonal elements of the covariance matrix are equal.
In this approximation Mahalanobis distance reduces to Euclidence distance.
tti
tli
ttil
i
tti
ttt
ili
tti
i
h
h
h
h
N
h
211
1
|||| mx
xm
s
Variance parameter of radial basis function becomes a scalar
![Page 25: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/25.jpg)
• Cluster based on similarities (distances)• Distance measure between instances xr and xs
Minkowski (Lp) (Euclidean for p = 2)
City-block distance
pd
j
psj
rj
srm xxd
/,
1
1 xx
25
d
jsj
rj
srcb xxd
1xx ,
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Hierarchical Clustering
![Page 26: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/26.jpg)
• Start with N groups each with one instance and merge the two closest groups at each iteration
• Distance between two groups Gi and Gj:• Single-link: smallest distance between all possible pairs of attributes
• Complete-link: largest distance between all possible pairs of attributes
• Average-link, distance between centroids
srji dGGd
js
ir
xxxx
,min,, GG
26
srji dGGd
js
ir
xxxx
,max,, GG
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
tti
ttt
ii
b
b xm
Agglomerative Clustering
![Page 27: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/27.jpg)
27
Dendrogram
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
At height h > sqrt(2) and < 2, dendrogram has the 3 clusters shown on data graphAt h > 2 dendrogram shows 2 clusters. c, d, and f are one cluster at this distance
Example: single-linked clusters
![Page 28: Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set](https://reader034.vdocuments.us/reader034/viewer/2022051621/5697bfd61a28abf838cadea5/html5/thumbnails/28.jpg)
• Application specific• Plot data (after PCA, for example) and check for clusters• Add one at a time using validation set
28Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Choosing K (how many clusters?)