local methods - texas a&m university

23
Local Methods Olive slides: Alpaydin Blue slides: Haykin, clarifications/notations

Upload: others

Post on 01-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Local Methods - Texas A&M University

Local Methods

• Olive slides: Alpaydin

• Blue slides: Haykin, clarifications/notations

Page 2: Local Methods - Texas A&M University

Introduction 3

Divide the input space into local regions and learn simple (constant/linear) models in each patch

Unsupervised: Competitive, online clustering Supervised: Radial-basis functions, mixture of

experts

Page 3: Local Methods - Texas A&M University

Competetive Learning

• X = {xt}t : set of samples (green).

• mi, i = 1, 2, ..., k: cluster centers (red).

• bti : if mi is closest to xt, 1.

• Note: t = index for input, i = index for cluster center.

Page 4: Local Methods - Texas A&M University

Competetive Learning: k-Means

• Batch: update cluster centers according to simple “mean” at each moment.

• Online: Use stochastic gradient descent.

– Note: mi is a vector, having scalar componentsmij (see next page).

• Both are iteratively done until convergence is achieved.

Page 5: Local Methods - Texas A&M University

Competetive Learning

∆ η m = (x−m)

x

m

m’

x−m

• Updating the center.

• ∆m = η(x−w)

• m = m + η(x−w)

• “Move center closer to the current input”

Page 6: Local Methods - Texas A&M University

Competitive Learning

ijtj

ti

ij

t

ij

t

ti

t

tti

i

lt

li

tti

t i itt

i

k

ii

mxbm

Em

k

b

bk

b

bE

:means- Online

:means- Batch

otherwise

min if

xm

mxmx

mxm

0

1

1X

Page 7: Local Methods - Texas A&M University

Replacing bti , etc.

• We can use lateral inhibition to implement bti in a more biologically plausible manner (see figure

in next slide). Needs iteration until vlaues settle.

• We can also use dot product instead of Euclidean distance as a distance measure.

• Hebbian learning is usually used for biologically plausible models.

Page 8: Local Methods - Texas A&M University

Winner-take-all network

i

Page 9: Local Methods - Texas A&M University

Adaptive Resonance Theory

Incremental; add a new cluster if not covered; defined by vigilance, ρ

otherwise

if

min

it

i

it

k

lt

k

li

tti

b

b

mxm

xm

mxmx

1

1

(Carpenter and Grossberg, 1988)

Page 10: Local Methods - Texas A&M University

SOM Overview

SOM is based on three principles:

• Competition: each neuron calculates a discriminant function.

The neuron with the highest value is declared the winner.

• Cooperation: Neurons near-by the winner on the lattice get a

chance to adapt.

• Adaptation: The winner and its neighbors increase their

discriminant function value relative to the current input.

Subsequent presentation of the current input should result in

enhanced function value.

Redundancy in the input is needed!

5

Page 11: Local Methods - Texas A&M University

Redundancy, etc.

• Unsupervised learning such as SOM require redundancy in the

data.

• The following are intimately related:

– Redundancy

– Structure (or organization)

– Information content relative to channel capacity

6

Page 12: Local Methods - Texas A&M University

Redundancy, etc. (cont’d)

Left Right

Structure No Yes

Redundancy No Yes

Info<Capacity No Yes

Consider each pixel as one random variable.

7

Page 13: Local Methods - Texas A&M University

Self-Organizing Map (SOM)

x x

w

1 2

2

x =

w1w =i i i

2D SOM Layer

Input

Kohonen (1982)

• 1-D, 2-D, or 3-D layout of units.

• One weight vector for each unit.

• Unsupervised learning (no target output).

9

Page 14: Local Methods - Texas A&M University

SOM Algorithm

x x

w

1 2

2

x =

w1w =i i i

2D SOM Layer

Input

Neighbor

1. Randomly initialize weight vectors wi

2. Randomly sample input vector x

3. Find Best Matching Unit (BMU):

i(x) = argminj‖x−wj‖

4. Update weight vectors:

wj ← wj + ηh(j, i(x))(x−wj)

η : learning rate

h(j, i(x)) : neighborhood function of BMU.

5. Repeat steps 2 – 4.

10

Page 15: Local Methods - Texas A&M University

SOM Learning

Input Space

input

wc(x−w)

SOM lattice

weightvector

• Weight vectors can be plotted in the input space.

• Weight vectors move, not according to their proximity to the input

in the input space, but according to their proximity in the lattice.

11

Page 16: Local Methods - Texas A&M University

Self-Organizing Maps 7 

2

2

22

1

ilile

ile lt

l

exp,

, mxm

Units have a neighborhood defined; mi is “between” mi-1 and mi+1, and are all updated together

One-dim map: (Kohonen, 1990)

Page 17: Local Methods - Texas A&M University

Typical Neighborhood FunctionsGaussian Neighborhood

exp(-(x*x+y*y)/2)

-4 -2 0 2 4 -4-2

02

4

00.10.20.30.40.50.60.70.80.91

• Gaussian: h(j, i(x)) = exp(−‖rj − ri(x)‖2/2σ2)

• Flat: h(j, i(x)) = 1 if ‖rj − ri(x)‖ ≤ σ, and 0 otherwise.

• σ is called the neighborhood radius.

• rj is the location of unit j on the lattice.

13

Page 18: Local Methods - Texas A&M University

Radial-Basis Functions

Locally-tuned units:

01

wpwyH

h

thh

t

2

2

2 h

ht

th

sp

mxexp

Page 19: Local Methods - Texas A&M University

RBF

• Input x to p: cluster centers m and radius (variance) s are estimated.

• p to output weights w can be calculated in one shot using pseudo inverse (output units areusually linear units). n RBF activation values (each row in P is the RBF activation valuesgenerated from each input vector),H RBF units,m output units.

p11 p12 · · · p1H

p21 p22 · · · p2H

.

.

.

.

.

.

.

.

.

.

.

.

pn1 pn2 · · · pnH

w1

w2

.

.

.

wH

=

y1

y2

.

.

.

ym

,

Pw = y

w = P−1

y, if n = H

w =(PT

P)−1

PT

y, if n > H

• Note: (PTP )−1PTP = (P−1(PT )−1)PTP = P−1(PT )−1PTP = P−1P = I

• Other iterative methods also exist (see next few slides).

Page 20: Local Methods - Texas A&M University

Training RBF 10 

Hybrid learning: First layer centers and spreads:

Unsupervised k-means

Second layer weights: Supervised gradient-descent

Fully supervised

(Broomhead and Lowe, 1988; Moody and Darken, 1989)

Page 21: Local Methods - Texas A&M University

Regression 11 

3

2

2

01

2

2

1

h

ht

th

t iih

ti

tih

h

hjtjt

ht i

ihti

tihj

t

th

ti

tiih

i

H

h

thih

ti

t i

ti

tihiihhh

spwyrs

s

mxpwyrm

pyrw

wpwy

yrwsE

mx

m

X|,,,

Page 22: Local Methods - Texas A&M University

Learning Vector Quantization 20 

otherwise

)label)label( if

it

i

it

it

i

mxm

mxmxm

(

H units per class prelabeled (Kohonen, 1990)

Given x, mi is the closest:

x

mi mj

Page 23: Local Methods - Texas A&M University

References