381 self organization map learning without examples

Self Organization Map

Learning without Examples

Learning Objectives

1.1. IntroductionIntroduction2.2. Hamming Net and MAXNETHamming Net and MAXNET3.3. ClusteringClustering4.4. Feature MapFeature Map5.5. Self-organizing Feature MapSelf-organizing Feature Map6.6. ConclusionConclusion

Introduction

1. Learning without examples

2. Data are input to the system one by one

3. Mapping the data into one or more dimension space

4. Competitive learning

Hamming Distance(1/5)1.1. Define HD(Hamming Distance) Define HD(Hamming Distance)

between two binary codesbetween two binary codes AA and and BB of the same length as the number of of the same length as the number of place in which place in which aaii , and , and bbjj differ.differ.

2.2. Ex:Ex:

AA: [1 -1: [1 -1 1 1 11 11 1]1]

BB: [1 1: [1 1 -1-1 -1-1 11 1]1]

=> HD(=> HD(AA, , BB) = 3) = 3

Hamming Distance(2/5)

Hamming Distance(4/5)3.3. Or, HD can be defined as the lowest Or, HD can be defined as the lowest

number of edges that must be number of edges that must be traversed between the two relevant traversed between the two relevant codes. codes.

4.4. If If AA and and BB are bipolar binary are bipolar binary components, then the scalar product components, then the scalar product of of AA , , BB : :

) ,() ,(

BAHDBAHDnBAdifferentarebitsagreedarebits

) ,() ,(

BAHDBAHDnBAdifferentarebitsagreedarebits

Hamming Net and MAXNET(1/30)

1.1. For a two layer classifier of binary For a two layer classifier of binary bipolar vectors, and bipolar vectors, and pp classes , classes , pp output neurons, the strongest output neurons, the strongest response of a neuron indicates it response of a neuron indicates it has the minimum HD value and the has the minimum HD value and the category this neuron represents.category this neuron represents.

2.2. The MAXNET operates to suppress The MAXNET operates to suppress values at output instead of output values at output instead of output values of Hamming network. values of Hamming network.

3.3. For the Hamming net in next figure, For the Hamming net in next figure, we have input vector we have input vector XX

pp classes => classes => pp neurons for output neurons for output

output vector output vector YY = [y = [y11,……y,……ypp]]

4.4. for any output neuron , for any output neuron , mm, , m=1, ……pm=1, ……p, , we havewe have

WWmm = [w = [wm1m1, w, wm2m2,……w,……wmnmn]]tt and and

m=1,2,……pm=1,2,……p

to be the weights between input to be the weights between input XX and and each output neuron.each output neuron.

5.5. Also, assuming that for each class Also, assuming that for each class mm, , one has the prototype vector one has the prototype vector SS(m)(m) as the as the standard to be matched.standard to be matched.

6.6. For classifying For classifying pp classes, one can say classes, one can say the the m’m’thth output is 1 if and only if output is 1 if and only if

XX= = SS(m)(m) => happens only => happens only WW(m)(m) = = SS(m)(m)

output for the classifier areoutput for the classifier are

XXttSS(1)(1), , XXttSS(2)(2),…,…XXttSS(m)(m),…,…XXttSS(p)(p)

So when So when XX= = SS(m)(m), the , the m’m’thth output is output is nn and and other outputs are smaller than other outputs are smaller than nn. .

7.7. XXttSS(m)(m) = (n - HD( = (n - HD(XX , , SS(m)(m)) ) - HD() ) - HD(XX , , SS(m)(m)) )

∴∴½ ½ XXttSS(m)(m) = n/2 – HD( = n/2 – HD(XX , , SS(m)(m)))

So the weight matrix:So the weight matrix:

WWHH=½=½SS

)2()2(2

)1()1(2

)2()2(2

)1()1(2

8.8. By giving a fixed bias By giving a fixed bias n/2n/2 to the input to the input

thenthen

netnetmm = ½ = ½XXttSS(m)(m) + n/2 for m=1,2, + n/2 for m=1,2,

……p……p

netnetmm = n - HD( = n - HD(XX , , SS(m)(m)))

9.9. To scale the input 0~n to 0~1 down, To scale the input 0~n to 0~1 down, one can apply transfer function as one can apply transfer function as

f(netf(netmm) = net) = netmm for m=1,2,…..p for m=1,2,…..p

10.10. So for the node with the the highest output So for the node with the the highest output means that the node has smallest HD means that the node has smallest HD between input and prototype vectorsbetween input and prototype vectors SS(1)(1)…………SS(m)(m) i.e.i.e.

f(netf(netmm) = 1) = 1

for other nodes for other nodes

f(netf(netmm) < 1) < 1

11.11. MAXNET is employed as a second MAXNET is employed as a second layer only for the cases where an layer only for the cases where an enhancement of the initial dominant enhancement of the initial dominant response of response of m’m’thth node is required. node is required.

i.e.,i.e., the purpose of MAXNET is to let the purpose of MAXNET is to let max{ ymax{ y11,……y,……ypp } } equal to 1 and let equal to 1 and let

others equal to 0.others equal to 0.

12.12. To achieve this, one can let To achieve this, one can let

where where i=1,……p ; j=1,……p ; ji=1,……p ; j=1,……p ; j≠i≠i

∵∵yyii is bounded by is bounded by 00≦≦yyii≦≦11 for i=1,……p for i=1,……p

↗↗

output of Hamming Net.output of Hamming Net.and can only be 1 or 0.and can only be 1 or 0.

ki yyfy

1kiy 1kiy

13.13. SoSoε is bounded by 0<ε<1/ε is bounded by 0<ε<1/ppandand

εε: lateral interaction coefficient: lateral interaction coefficient)(

And And k

Mk yWnet k

Mk yWnet

14. So the transfer function 14. So the transfer function

netnet

netnetf

netnet

netnetf

Ex: To have a Hamming Net for Ex: To have a Hamming Net for classifying C , I , Tclassifying C , I , T

thenthen

SS(1)(1) = [ 1 1 1 1 -1 -1 1 1 1 ] = [ 1 1 1 1 -1 -1 1 1 1 ]tt

SS(2)(2) = [ -1 1 -1 -1 1 -1 -1 1 -1 ] = [ -1 1 -1 -1 1 -1 -1 1 -1 ]tt

SS(3)(3) = [ 1 1 1 -1 1 -1 -1 1 -1 ] = [ 1 1 1 -1 1 -1 -1 1 -1 ]tt

111111111

1 nXWnet H

ForFor t

111111111

7 AndAnd

Input to MAXNET and select Input to MAXNET and select =0.2 < =0.2 < 1/3(=1/p)1/3(=1/p)

AndAnd

k=0k=0

12.02.0

2.012.0

2.02.01

12.02.0

2.012.0

2.02.01

k=1k=1

120.00520.0

120.0120.0520.0

120.00520.0

120.0120.0520.0

k=2k=2

096.00480.0

096.014.0480.0

096.00480.0

096.014.0480.0

k=3k=3

00461.0

10115.0461.0

00461.0

10115.0461.0

Summary:Summary:

Hamming network only tells which Hamming network only tells which class it will mostly like, not to class it will mostly like, not to restore distorted pattern. restore distorted pattern.

Clustering—Unsupervised Learning(1/20)

1.1. IntroductionIntroductiona.a. to categorize or cluster datato categorize or cluster data b.b. ggrouping similar objects and rouping similar objects and

separating of dissimilar onesseparating of dissimilar ones

2.2. Assuming pattern set Assuming pattern set {{XX11 , , XX22 ,…… ,……XXNN} is } is submitted to determine the decision submitted to determine the decision function required to identify possible function required to identify possible clusters,clusters,=> by similarity rules=> by similarity rules

: DistanceEuclidean

XXXXXX

: DistanceEuclidean

XXXXXX

3.3. Winner-Take-AllWinner-Take-All learning learning

Assuming input vectors are to be Assuming input vectors are to be classified into one of specific number of classified into one of specific number of pp categories according to the clusters categories according to the clusters detected in the training set detected in the training set {{XX11, , XX22 ,…… ,……

XXNN}.}.

Kohonen NetworkKohonen Network

YY=f(=f(WW XX))

for vector size = for vector size = nn

,p,for iwwW

prior to the learning, normalization of all prior to the learning, normalization of all weight vectors is requiredweight vectors is required

The meaning of training is to find from The meaning of training is to find from all such thatall such that

m WXWX,,1

i.e., the m’th neuron with vector is the closest approximation of the current X.

FromFrom

m WXWX,,1

XWXXWX

,,1,,1

max min

XWXXWX

,,1,,1

max min

1, ,max

m ii p

W X W X

1, ,max

m ii p

W X W X

The The m’m’thth neuron is the winning neuron and neuron is the winning neuron and has the largest value of has the largest value of netnetii, , i=1,……pi=1,……p..

Thus Thus WWmm should be adjusted such that should be adjusted such that

||||XX - - WWmm || is reduced. || is reduced.

Then , |Then , |||XX - - WWmm || can be reduced by the || can be reduced by the direction of gradient.direction of gradient.

▽▽WWmm || ||XX - - WWmm || ||22 = -2( = -2(XX - - WWmm))

Or,Or,

increase in increase in ((XX - - WWmm) direction.) direction.

For any For any XX is just a single step, we want only is just a single step, we want only fraction of fraction of ((XX - - WWmm)), thus, thus

for neuron for neuron mm::

▽▽WWmm’ = ’ = α(α(XX - - WWmm)) 0.1< α <0.70.1< α <0.7

for other neurons:for other neurons:

▽▽WWii’ = 0’ = 0

In general:In general:

miforWW

constant learning:α

neuron winning:m where)(

miforWW

constant learning:α

neuron winning:m where)(

,p,for iXW ti 1 max

4.4. GeometricalGeometrical interpretationinterpretation

input vector (normalized) and input vector (normalized) and weight is winnerweight is winner

So is created.

Thus the weight adjustment is mainly the rotation of the weight vector toward input vector without a significant length change.

And Wm’ is not a normal , so in the new training stage , Wm’ must be normalized again. is a vector point to the gravity of each cluster on a unity sphere.

5. In the case of some patterns are known class,

mm WXW

α > 0 for correct nodeα < 0 otherwiseThis will accelerate the learning process significantly.

6. Another modification is to adjust the weight for both winner and losers. leaky competitive learning

7. Recall Y= f(W X)8. Initialization of weights

Randomly choose weight from U(0,1)or convex combination

piforn

W ti ,,1 11

10 piforn

W ti ,,1 11

Feature Map(1/6)1. Transform from high dimensional pattern

space to low-dimension feature space .2. Feature extraction:

two category: natural structureno natural structure

-- depend on pattern is similar to human perception or not

Feature Map(2/6)

Feature Map(3/6)

3. another important aspect is to represent the feature as natural as possible.

4. Self organizing neural array that can map features from pattern space.

Feature Map(4/6)5. Use one dimensional array or

Two dimensional array6. Example:

X: input vectori: neuron i

wij:weight from input xj to neuron i

yi: output of neuron i

Feature Map(5/6)

Feature Map(6/6)

7. So define

WXWXSwhere

and between similarity themeans ),(

WXWXSwhere

and between similarity themeans ),(

8. Thus, (b) is the result of (a)

Self-organizing Feature Map(1/6)1. One dimensional mapping

set of pattern Xi, i=1,2,….

use a linear array i1> i2> i3>…..

if XXXXXXX

,2,1),(max

Self-organizing Feature Map(2/6)2. Thus, this learning is to find the

best matching neuron cells whch can activate their spatial neighbors to react to the same input.

3. Or to find Wc such that

c WXWX min ii

c WXWX min

Self-organizing Feature Map(3/6)4. In case of two winners, choose

lower i.

5. If c is the winning neuron then define Nc as the neighborhood around c and Nc is always changing as learning going.

6. Then

otherwise 0

odneighborho in the is if )( cijjij

otherwise 0

odneighborho in the is if )( cijjij

Self-organizing Feature Map(4/6)7. :

where t: current training iterationT: total # of training steps to

be done.

8. starts from 0 and is decreased until it reaches value of 0

Self-organizing Feature Map(5/6)9. If rectangular neighborhood is used

thenc-d < x < c+dc-d < y < c+d

and as learning continues

d is decreased from d0 to 1

tdd 10

Self-organizing Feature Map-example

10. Example:a. Input patterns are chosen randomly from U[0,1]. b. The data employed in the experiment comprised 500 points distributed uniformly over the bipolar square [−1,1] × [−1, 1] c. The points thus describe a geometrically square topology.d. Thus, initial weights can be plotted as in next

figure.d.“ －” connection line connects two

adjacent neuron in competitive layer.

Self-organizing Feature Map-example

Self-organizing Feature Map-- Example

Self-organizing Feature Map—Example

Self-organizing Feature MapExample

381 self organization map learning without examples

Documents

a comprehensive, reliable solution · 2018-01-21 · choco...

cos 381 day 21. agenda questions?? resources source code...

lecture 4 more examples of karnaugh map. logic reduction...

series 381 conductive plastic potentiometer 381.pdf ·...

ais land cover map of denmark examples of its application 1....

map service specifications - maps.alberta.ca...cached map...

cos 381 day 19. agenda questions?? resources source code...

cos 381 day 2. © 2006 pearson addison-wesley. all rights...

bleach 381

lbvmvt 381

long duty rule map examples - afa...

examples from earlier assessment examples of recent...

381 sherbrook street - long - city of winnipeg · 381...

cos 381 day 23. agenda questions?? resources source code...

knowledge map examples 2

sw sector map examples

195, 381–410 (1994) 381 printed in great britain © the

•more karnaugh map examples •quine-mccluskey...

cos 381 day 17. agenda questions?? resources source code...

fast and simple constant-time hashing to the bls12-381...