local methods - texas a&m university
TRANSCRIPT
![Page 1: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/1.jpg)
Local Methods
• Olive slides: Alpaydin
• Blue slides: Haykin, clarifications/notations
![Page 2: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/2.jpg)
Introduction 3
Divide the input space into local regions and learn simple (constant/linear) models in each patch
Unsupervised: Competitive, online clustering Supervised: Radial-basis functions, mixture of
experts
![Page 3: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/3.jpg)
Competetive Learning
• X = {xt}t : set of samples (green).
• mi, i = 1, 2, ..., k: cluster centers (red).
• bti : if mi is closest to xt, 1.
• Note: t = index for input, i = index for cluster center.
![Page 4: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/4.jpg)
Competetive Learning: k-Means
• Batch: update cluster centers according to simple “mean” at each moment.
• Online: Use stochastic gradient descent.
– Note: mi is a vector, having scalar componentsmij (see next page).
• Both are iteratively done until convergence is achieved.
![Page 5: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/5.jpg)
Competetive Learning
∆ η m = (x−m)
x
m
m’
x−m
• Updating the center.
• ∆m = η(x−w)
• m = m + η(x−w)
• “Move center closer to the current input”
![Page 6: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/6.jpg)
Competitive Learning
ijtj
ti
ij
t
ij
t
ti
t
tti
i
lt
li
tti
t i itt
i
k
ii
mxbm
Em
k
b
bk
b
bE
:means- Online
:means- Batch
otherwise
min if
xm
mxmx
mxm
0
1
1X
4
![Page 7: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/7.jpg)
Replacing bti , etc.
• We can use lateral inhibition to implement bti in a more biologically plausible manner (see figure
in next slide). Needs iteration until vlaues settle.
• We can also use dot product instead of Euclidean distance as a distance measure.
• Hebbian learning is usually used for biologically plausible models.
![Page 8: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/8.jpg)
5
Winner-take-all network
i
![Page 9: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/9.jpg)
Adaptive Resonance Theory
Incremental; add a new cluster if not covered; defined by vigilance, ρ
otherwise
if
min
it
i
it
k
lt
k
li
tti
b
b
mxm
xm
mxmx
1
1
6
(Carpenter and Grossberg, 1988)
![Page 10: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/10.jpg)
SOM Overview
SOM is based on three principles:
• Competition: each neuron calculates a discriminant function.
The neuron with the highest value is declared the winner.
• Cooperation: Neurons near-by the winner on the lattice get a
chance to adapt.
• Adaptation: The winner and its neighbors increase their
discriminant function value relative to the current input.
Subsequent presentation of the current input should result in
enhanced function value.
Redundancy in the input is needed!
5
![Page 11: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/11.jpg)
Redundancy, etc.
• Unsupervised learning such as SOM require redundancy in the
data.
• The following are intimately related:
– Redundancy
– Structure (or organization)
– Information content relative to channel capacity
6
![Page 12: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/12.jpg)
Redundancy, etc. (cont’d)
Left Right
Structure No Yes
Redundancy No Yes
Info<Capacity No Yes
Consider each pixel as one random variable.
7
![Page 13: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/13.jpg)
Self-Organizing Map (SOM)
x x
w
1 2
2
x =
w1w =i i i
2D SOM Layer
Input
Kohonen (1982)
• 1-D, 2-D, or 3-D layout of units.
• One weight vector for each unit.
• Unsupervised learning (no target output).
9
![Page 14: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/14.jpg)
SOM Algorithm
x x
w
1 2
2
x =
w1w =i i i
2D SOM Layer
Input
Neighbor
1. Randomly initialize weight vectors wi
2. Randomly sample input vector x
3. Find Best Matching Unit (BMU):
i(x) = argminj‖x−wj‖
4. Update weight vectors:
wj ← wj + ηh(j, i(x))(x−wj)
η : learning rate
h(j, i(x)) : neighborhood function of BMU.
5. Repeat steps 2 – 4.
10
![Page 15: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/15.jpg)
SOM Learning
Input Space
input
wc(x−w)
SOM lattice
weightvector
• Weight vectors can be plotted in the input space.
• Weight vectors move, not according to their proximity to the input
in the input space, but according to their proximity in the lattice.
11
![Page 16: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/16.jpg)
Self-Organizing Maps 7
2
2
22
1
ilile
ile lt
l
exp,
, mxm
Units have a neighborhood defined; mi is “between” mi-1 and mi+1, and are all updated together
One-dim map: (Kohonen, 1990)
![Page 17: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/17.jpg)
Typical Neighborhood FunctionsGaussian Neighborhood
exp(-(x*x+y*y)/2)
-4 -2 0 2 4 -4-2
02
4
00.10.20.30.40.50.60.70.80.91
• Gaussian: h(j, i(x)) = exp(−‖rj − ri(x)‖2/2σ2)
• Flat: h(j, i(x)) = 1 if ‖rj − ri(x)‖ ≤ σ, and 0 otherwise.
• σ is called the neighborhood radius.
• rj is the location of unit j on the lattice.
13
![Page 18: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/18.jpg)
Radial-Basis Functions
Locally-tuned units:
01
wpwyH
h
thh
t
2
2
2 h
ht
th
sp
mxexp
8
![Page 19: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/19.jpg)
RBF
• Input x to p: cluster centers m and radius (variance) s are estimated.
• p to output weights w can be calculated in one shot using pseudo inverse (output units areusually linear units). n RBF activation values (each row in P is the RBF activation valuesgenerated from each input vector),H RBF units,m output units.
p11 p12 · · · p1H
p21 p22 · · · p2H
.
.
.
.
.
.
.
.
.
.
.
.
pn1 pn2 · · · pnH
w1
w2
.
.
.
wH
=
y1
y2
.
.
.
ym
,
Pw = y
w = P−1
y, if n = H
w =(PT
P)−1
PT
y, if n > H
• Note: (PTP )−1PTP = (P−1(PT )−1)PTP = P−1(PT )−1PTP = P−1P = I
• Other iterative methods also exist (see next few slides).
![Page 20: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/20.jpg)
Training RBF 10
Hybrid learning: First layer centers and spreads:
Unsupervised k-means
Second layer weights: Supervised gradient-descent
Fully supervised
(Broomhead and Lowe, 1988; Moody and Darken, 1989)
![Page 21: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/21.jpg)
Regression 11
3
2
2
01
2
2
1
h
ht
th
t iih
ti
tih
h
hjtjt
ht i
ihti
tihj
t
th
ti
tiih
i
H
h
thih
ti
t i
ti
tihiihhh
spwyrs
s
mxpwyrm
pyrw
wpwy
yrwsE
mx
m
X|,,,
![Page 22: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/22.jpg)
Learning Vector Quantization 20
otherwise
)label)label( if
it
i
it
it
i
mxm
mxmxm
(
H units per class prelabeled (Kohonen, 1990)
Given x, mi is the closest:
x
mi mj
![Page 23: Local Methods - Texas A&M University](https://reader035.vdocuments.us/reader035/viewer/2022071613/615731fd2f925653a00e3a60/html5/thumbnails/23.jpg)
References