381 self organization map learning without examples

38 1

Self Organization Map

Learning without Examples

22

Learning Objectives

1.1. IntroductionIntroduction2.2. Hamming Net and MAXNETHamming Net and MAXNET3.3. ClusteringClustering4.4. Feature MapFeature Map5.5. Self-organizing Feature MapSelf-organizing Feature Map6.6. ConclusionConclusion

33

Introduction

1. Learning without examples

2. Data are input to the system one by one

3. Mapping the data into one or more dimension space

4. Competitive learning

44

Hamming Distance(1/5)1.1. Define HD(Hamming Distance) Define HD(Hamming Distance)

between two binary codesbetween two binary codes AA and and BB of the same length as the number of of the same length as the number of place in which place in which aaii , and , and bbjj differ.differ.

2.2. Ex:Ex:

AA: [1 -1: [1 -1 1 1 11 11 1]1]

BB: [1 1: [1 1 -1-1 -1-1 11 1]1]

=> HD(=> HD(AA, , BB) = 3) = 3

55

Hamming Distance(2/5)

66


77

Hamming Distance(4/5)3.3. Or, HD can be defined as the lowest Or, HD can be defined as the lowest

number of edges that must be number of edges that must be traversed between the two relevant traversed between the two relevant codes. codes.

4.4. If If AA and and BB are bipolar binary are bipolar binary components, then the scalar product components, then the scalar product of of AA , , BB : :

88


) ,(2

) ,() ,(

BAHDn

BAHDBAHDnBAdifferentarebitsagreedarebits

t

) ,(2

) ,() ,(

BAHDn

BAHDBAHDnBAdifferentarebitsagreedarebits

t

99

Hamming Net and MAXNET(1/30)

1.1. For a two layer classifier of binary For a two layer classifier of binary bipolar vectors, and bipolar vectors, and pp classes , classes , pp output neurons, the strongest output neurons, the strongest response of a neuron indicates it response of a neuron indicates it has the minimum HD value and the has the minimum HD value and the category this neuron represents.category this neuron represents.

1010


1111


2.2. The MAXNET operates to suppress The MAXNET operates to suppress values at output instead of output values at output instead of output values of Hamming network. values of Hamming network.

3.3. For the Hamming net in next figure, For the Hamming net in next figure, we have input vector we have input vector XX

pp classes => classes => pp neurons for output neurons for output

output vector output vector YY = [y = [y11,……y,……ypp]]

1212


1313


4.4. for any output neuron , for any output neuron , mm, , m=1, ……pm=1, ……p, , we havewe have

WWmm = [w = [wm1m1, w, wm2m2,……w,……wmnmn]]tt and and

m=1,2,……pm=1,2,……p

to be the weights between input to be the weights between input XX and and each output neuron.each output neuron.

5.5. Also, assuming that for each class Also, assuming that for each class mm, , one has the prototype vector one has the prototype vector SS(m)(m) as the as the standard to be matched.standard to be matched.

1414


6.6. For classifying For classifying pp classes, one can say classes, one can say the the m’m’thth output is 1 if and only if output is 1 if and only if

XX= = SS(m)(m) => happens only => happens only WW(m)(m) = = SS(m)(m)

output for the classifier areoutput for the classifier are

XXttSS(1)(1), , XXttSS(2)(2),…,…XXttSS(m)(m),…,…XXttSS(p)(p)

So when So when XX= = SS(m)(m), the , the m’m’thth output is output is nn and and other outputs are smaller than other outputs are smaller than nn. .

1515


7.7. XXttSS(m)(m) = (n - HD( = (n - HD(XX , , SS(m)(m)) ) - HD() ) - HD(XX , , SS(m)(m)) )

∴∴½ ½ XXttSS(m)(m) = n/2 – HD( = n/2 – HD(XX , , SS(m)(m)))

So the weight matrix:So the weight matrix:

WWHH=½=½SS

)()(2

)(1

)2()2(2

)2(1

)1()1(2

)1(1

2

1

pn

pp

n

n

H

SSS

SSS

SSS

W

)()(2

)(1

)2()2(2

)2(1

)1()1(2

)1(1

2

1

pn

pp

n

n

H

SSS

SSS

SSS

W

1616


8.8. By giving a fixed bias By giving a fixed bias n/2n/2 to the input to the input

thenthen

netnetmm = ½ = ½XXttSS(m)(m) + n/2 for m=1,2, + n/2 for m=1,2,

……p……p

or or

netnetmm = n - HD( = n - HD(XX , , SS(m)(m)))

1717


9.9. To scale the input 0~n to 0~1 down, To scale the input 0~n to 0~1 down, one can apply transfer function as one can apply transfer function as

f(netf(netmm) = net) = netmm for m=1,2,…..p for m=1,2,…..p

1818


1919


10.10. So for the node with the the highest output So for the node with the the highest output means that the node has smallest HD means that the node has smallest HD between input and prototype vectorsbetween input and prototype vectors SS(1)(1)…………SS(m)(m) i.e.i.e.

f(netf(netmm) = 1) = 1

for other nodes for other nodes

f(netf(netmm) < 1) < 1

2020


11.11. MAXNET is employed as a second MAXNET is employed as a second layer only for the cases where an layer only for the cases where an enhancement of the initial dominant enhancement of the initial dominant response of response of m’m’thth node is required. node is required.

i.e.,i.e., the purpose of MAXNET is to let the purpose of MAXNET is to let max{ ymax{ y11,……y,……ypp } } equal to 1 and let equal to 1 and let

others equal to 0.others equal to 0.

2121


2222


12.12. To achieve this, one can let To achieve this, one can let

where where i=1,……p ; j=1,……p ; ji=1,……p ; j=1,……p ; j≠i≠i

∵∵yyii is bounded by is bounded by 00≦≦yyii≦≦11 for i=1,……p for i=1,……p

↗↗

output of Hamming Net.output of Hamming Net.and can only be 1 or 0.and can only be 1 or 0.

)(1 k

jj

ki

ki yyfy

)(1 k

jj

ki

ki yyfy

1kiy 1kiy

2323


13.13. SoSoε is bounded by 0<ε<1/ε is bounded by 0<ε<1/ppandand

εε: lateral interaction coefficient: lateral interaction coefficient)(

1

1

1

1

pp

MW

)(1

1

1

1

pp

MW

2424


And And k

Mk yWnet k

Mk yWnet

2525


14. So the transfer function 14. So the transfer function

0

0 0)(

netnet

netnetf

0

0 0)(

netnet

netnetf

2626


2727


Ex: To have a Hamming Net for Ex: To have a Hamming Net for classifying C , I , Tclassifying C , I , T

thenthen

SS(1)(1) = [ 1 1 1 1 -1 -1 1 1 1 ] = [ 1 1 1 1 -1 -1 1 1 1 ]tt

SS(2)(2) = [ -1 1 -1 -1 1 -1 -1 1 -1 ] = [ -1 1 -1 -1 1 -1 -1 1 -1 ]tt

SS(3)(3) = [ 1 1 1 -1 1 -1 -1 1 -1 ] = [ 1 1 1 -1 1 -1 -1 1 -1 ]tt

2828


2929


SoSo

111111111

111111111

111111111

HW

111111111

111111111

111111111

HW

22

1 nXWnet H

22

1 nXWnet H

3030


ForFor t

t

net

X

537

111111111

t

t

net

X

537

111111111

Y

netft

9

5

9

3

9

7

Y

netft

9

5

9

3

9

7 AndAnd

3131


Input to MAXNET and select Input to MAXNET and select =0.2 < =0.2 < 1/3(=1/p)1/3(=1/p)

SoSo

15

1

5

15

11

5

15

1

5

11

MW

15

1

5

15

11

5

15

1

5

11

MW

3232


AndAnd

kk

kM

k

netfY

YWnet

1 kk

kM

k

netfY

YWnet

1

3333


k=0k=0

333.0

067.0

599.0

333.0

067.0

599.0

555.0

333.0

777.0

12.02.0

2.012.0

2.02.01

01

0

netfY

o

net

333.0

067.0

599.0

333.0

067.0

599.0

555.0

333.0

777.0

12.02.0

2.012.0

2.02.01

01

0

netfY

o

net

3434


k=1k=1

t

t

Y

net

2

0

1

120.00520.0

120.0120.0520.0

t

t

Y

net

2

0

1

120.00520.0

120.0120.0520.0

3535


k=2k=2

t

t

Y

net

3

0

2

096.00480.0

096.014.0480.0

t

t

Y

net

3

0

2

096.00480.0

096.014.0480.0

3636


k=3k=3

t

t

Y

net

4

7

0

3

00461.0

10115.0461.0

t

t

Y

net

4

7

0

3

00461.0

10115.0461.0

3737


Summary:Summary:

Hamming network only tells which Hamming network only tells which class it will mostly like, not to class it will mostly like, not to restore distorted pattern. restore distorted pattern.

3838

Clustering—Unsupervised Learning(1/20)

1.1. IntroductionIntroductiona.a. to categorize or cluster datato categorize or cluster data b.b. ggrouping similar objects and rouping similar objects and

separating of dissimilar onesseparating of dissimilar ones

2.2. Assuming pattern set Assuming pattern set {{XX11 , , XX22 ,…… ,……XXNN} is } is submitted to determine the decision submitted to determine the decision function required to identify possible function required to identify possible clusters,clusters,=> by similarity rules=> by similarity rules

3939


cos

)()(

: DistanceEuclidean

XX

XXψ

XXXXXX

i

it

it

ii

cos

)()(

: DistanceEuclidean

XX

XXψ

XXXXXX

i

it

it

ii

4040


4141


3.3. Winner-Take-AllWinner-Take-All learning learning

Assuming input vectors are to be Assuming input vectors are to be classified into one of specific number of classified into one of specific number of pp categories according to the clusters categories according to the clusters detected in the training set detected in the training set {{XX11, , XX22 ,…… ,……

XXNN}.}.

4242


4343


Kohonen NetworkKohonen Network

YY=f(=f(WW XX))

for vector size = for vector size = nn

,p,for iwwW

W

W

W

W

inii

t

tp

t

t

1

and

1

2

1

,p,for iwwW

W

W

W

W

inii

t

tp

t

t

1

and

1

2

1

4444


prior to the learning, normalization of all prior to the learning, normalization of all weight vectors is requiredweight vectors is required

So So

i

ii W

WW

i

ii W

WW

4545


The meaning of training is to find from The meaning of training is to find from all such thatall such that

mW

mW

ipi

m WXWX,,1

min

ipi

m WXWX,,1

min

i.e., the m’th neuron with vector is the closest approximation of the current X.

4646


FromFrom

ipi

m WXWX,,1

min

ipi

m WXWX,,1

min

XWWX

and

XWXXWX

ti

pii

pi

tm

tm

,,1,,1

21

max min

12

XWWX

and

XWXXWX

ti

pii

pi

tm

tm

,,1,,1

21

max min

12

4747


SoSo

1, ,max

t t

m ii p

W X W X

1, ,max

t t

m ii p

W X W X

The The m’m’thth neuron is the winning neuron and neuron is the winning neuron and has the largest value of has the largest value of netnetii, , i=1,……pi=1,……p..

4848


Thus Thus WWmm should be adjusted such that should be adjusted such that

||||XX - - WWmm || is reduced. || is reduced.

Then , |Then , |||XX - - WWmm || can be reduced by the || can be reduced by the direction of gradient.direction of gradient.

▽▽WWmm || ||XX - - WWmm || ||22 = -2( = -2(XX - - WWmm))

Or,Or,

increase in increase in ((XX - - WWmm) direction.) direction.

4949


For any For any XX is just a single step, we want only is just a single step, we want only fraction of fraction of ((XX - - WWmm)), thus, thus

for neuron for neuron mm::

▽▽WWmm’ = ’ = α(α(XX - - WWmm)) 0.1< α <0.70.1< α <0.7

for other neurons:for other neurons:

▽▽WWii’ = 0’ = 0

5050


In general:In general:

miforWW

WXWW

k

i

k

i

k

mk

k

m

k

m

constant learning:α

neuron winning:m where)(

1

1

miforWW

WXWW

k

i

k

i

k

mk

k

m

k

m

constant learning:α

neuron winning:m where)(

1

1

5151


X

X

X

X

X

X

X

X

X

mW

mW

,p,for iXW ti 1 max

,p,for iXW ti 1 max

4.4. GeometricalGeometrical interpretationinterpretation

input vector (normalized) and input vector (normalized) and weight is winnerweight is winner

=>=>

5252


5353


So is created.

X

X

X

X

X

X

X

mWX

mWX

WXW

WWW

m

mmm

'

WXW

WWW

m

mmm

'And

5454


Thus the weight adjustment is mainly the rotation of the weight vector toward input vector without a significant length change.

And Wm’ is not a normal , so in the new training stage , Wm’ must be normalized again. is a vector point to the gravity of each cluster on a unity sphere.

X

X

X

X

X

X

X

iW

iW

5555


5. In the case of some patterns are known class,

then

X

X

X

X

X

X

X

mm WXW

mm WXW

α > 0 for correct nodeα < 0 otherwiseThis will accelerate the learning process significantly.

5656


6. Another modification is to adjust the weight for both winner and losers. leaky competitive learning

7. Recall Y= f(W X)8. Initialization of weights

Randomly choose weight from U(0,1)or convex combination

X

X

X

X

X

X

X

piforn

W ti ,,1 11

10 piforn

W ti ,,1 11

10

5757

Feature Map(1/6)1. Transform from high dimensional pattern

space to low-dimension feature space .2. Feature extraction:

two category: natural structureno natural structure

-- depend on pattern is similar to human perception or not

X

X

X

X

X

X

X

5858

Feature Map(2/6)

X

X

X

X

X

X

X

5959

Feature Map(3/6)

3. another important aspect is to represent the feature as natural as possible.

4. Self organizing neural array that can map features from pattern space.

X

X

X

X

X

X

X

6060

Feature Map(4/6)5. Use one dimensional array or

Two dimensional array6. Example:

X: input vectori: neuron i

wij:weight from input xj to neuron i

yi: output of neuron i

X

X

X

X

X

X

X

6161

Feature Map(5/6)

X

X

X

X

X

X

X

6262

Feature Map(6/6)

7. So define

X

X

X

X

X

X

X

ii

ii

WXWXSwhere

WXSfy

and between similarity themeans ),(

)),((

ii

ii

WXWXSwhere

WXSfy

and between similarity themeans ),(

)),((

8. Thus, (b) is the result of (a)

6363

Self-organizing Feature Map(1/6)1. One dimensional mapping

set of pattern Xi, i=1,2,….

use a linear array i1> i2> i3>…..

if XXXXXXX

,2,1),(max

,2,1),(max

2

1

2

1

iXyy

iXyy

ii

ii

,2,1),(max

,2,1),(max

2

1

2

1

iXyy

iXyy

ii

ii

6464

Self-organizing Feature Map(2/6)2. Thus, this learning is to find the

best matching neuron cells whch can activate their spatial neighbors to react to the same input.

3. Or to find Wc such that

X

X

X

X

X

X

X

ii

c WXWX min ii

c WXWX min

6565

Self-organizing Feature Map(3/6)4. In case of two winners, choose

lower i.

5. If c is the winning neuron then define Nc as the neighborhood around c and Nc is always changing as learning going.

6. Then

X

X

X

X

X

X

X

otherwise 0

odneighborho in the is if )( cijjij

Niwxw

otherwise 0

odneighborho in the is if )( cijjij

Niwxw

6666

Self-organizing Feature Map(4/6)7. :

where t: current training iterationT: total # of training steps to

be done.

8. starts from 0 and is decreased until it reaches value of 0

X

X

X

X

X

X

X

T

tt 10

T

tt 10

6767

Self-organizing Feature Map(5/6)9. If rectangular neighborhood is used

thenc-d < x < c+dc-d < y < c+d

and as learning continues

d is decreased from d0 to 1

X

X

X

X

X

X

X

T

tdd 10

T

tdd 10

6868

Self-organizing Feature Map-example

10. Example:a. Input patterns are chosen randomly from U[0,1]. b. The data employed in the experiment comprised 500 points distributed uniformly over the bipolar square [−1,1] × [−1, 1] c. The points thus describe a geometrically square topology.d. Thus, initial weights can be plotted as in next

figure.d.“ －” connection line connects two

adjacent neuron in competitive layer.

X

X

X

X

X

X

X

6969

Self-organizing Feature Map-example

X

X

X

X

X

X

X

7070

Self-organizing Feature Map-- Example

X

X

X

X

X

X

X

7171

Self-organizing Feature Map—Example

X

X

X

X

X

X

X

7272

Self-organizing Feature MapExample

X

X

X

X

X

X

X

381 self organization map learning without examples

Documents