381 self organization map learning without examples

Post on 21-Jan-2016

233 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

38 1

Self Organization Map

Learning without Examples

22

Learning Objectives

1.1. IntroductionIntroduction2.2. Hamming Net and MAXNETHamming Net and MAXNET3.3. ClusteringClustering4.4. Feature MapFeature Map5.5. Self-organizing Feature MapSelf-organizing Feature Map6.6. ConclusionConclusion

33

Introduction

1. Learning without examples

2. Data are input to the system one by one

3. Mapping the data into one or more dimension space

4. Competitive learning

44

Hamming Distance(1/5)1.1. Define HD(Hamming Distance) Define HD(Hamming Distance)

between two binary codesbetween two binary codes AA and and BB of the same length as the number of of the same length as the number of place in which place in which aaii , and , and bbjj differ.differ.

2.2. Ex:Ex:

AA: [1 -1: [1 -1 1 1 11 11 1]1]

BB: [1 1: [1 1 -1-1 -1-1 11 1]1]

=> HD(=> HD(AA, , BB) = 3) = 3

55

Hamming Distance(2/5)

66

Hamming Distance(3/5)

77

Hamming Distance(4/5)3.3. Or, HD can be defined as the lowest Or, HD can be defined as the lowest

number of edges that must be number of edges that must be traversed between the two relevant traversed between the two relevant codes. codes.

4.4. If If AA and and BB are bipolar binary are bipolar binary components, then the scalar product components, then the scalar product of of AA , , BB : :

88

Hamming Distance(5/5)

) ,(2

) ,() ,(

BAHDn

BAHDBAHDnBAdifferentarebitsagreedarebits

t

) ,(2

) ,() ,(

BAHDn

BAHDBAHDnBAdifferentarebitsagreedarebits

t

99

Hamming Net and MAXNET(1/30)

1.1. For a two layer classifier of binary For a two layer classifier of binary bipolar vectors, and bipolar vectors, and pp classes , classes , pp output neurons, the strongest output neurons, the strongest response of a neuron indicates it response of a neuron indicates it has the minimum HD value and the has the minimum HD value and the category this neuron represents.category this neuron represents.

1010

Hamming Net and MAXNET(2/30)

1111

Hamming Net and MAXNET(3/30)

2.2. The MAXNET operates to suppress The MAXNET operates to suppress values at output instead of output values at output instead of output values of Hamming network. values of Hamming network.

3.3. For the Hamming net in next figure, For the Hamming net in next figure, we have input vector we have input vector XX

pp classes => classes => pp neurons for output neurons for output

output vector output vector YY = [y = [y11,……y,……ypp]]

1212

Hamming Net and MAXNET(4/30)

1313

Hamming Net and MAXNET(5/30)

4.4. for any output neuron , for any output neuron , mm, , m=1, ……pm=1, ……p, , we havewe have

WWmm = [w = [wm1m1, w, wm2m2,……w,……wmnmn]]tt and and

m=1,2,……pm=1,2,……p

to be the weights between input to be the weights between input XX and and each output neuron.each output neuron.

5.5. Also, assuming that for each class Also, assuming that for each class mm, , one has the prototype vector one has the prototype vector SS(m)(m) as the as the standard to be matched.standard to be matched.

1414

Hamming Net and MAXNET(6/30)

6.6. For classifying For classifying pp classes, one can say classes, one can say the the m’m’thth output is 1 if and only if output is 1 if and only if

XX= = SS(m)(m) => happens only => happens only WW(m)(m) = = SS(m)(m)

output for the classifier areoutput for the classifier are

XXttSS(1)(1), , XXttSS(2)(2),…,…XXttSS(m)(m),…,…XXttSS(p)(p)

So when So when XX= = SS(m)(m), the , the m’m’thth output is output is nn and and other outputs are smaller than other outputs are smaller than nn. .

1515

Hamming Net and MAXNET(7/30)

7.7. XXttSS(m)(m) = (n - HD( = (n - HD(XX , , SS(m)(m)) ) - HD() ) - HD(XX , , SS(m)(m)) )

∴∴½ ½ XXttSS(m)(m) = n/2 – HD( = n/2 – HD(XX , , SS(m)(m)))

So the weight matrix:So the weight matrix:

WWHH=½=½SS

)()(2

)(1

)2()2(2

)2(1

)1()1(2

)1(1

2

1

pn

pp

n

n

H

SSS

SSS

SSS

W

)()(2

)(1

)2()2(2

)2(1

)1()1(2

)1(1

2

1

pn

pp

n

n

H

SSS

SSS

SSS

W

1616

Hamming Net and MAXNET(8/30)

8.8. By giving a fixed bias By giving a fixed bias n/2n/2 to the input to the input

thenthen

netnetmm = ½ = ½XXttSS(m)(m) + n/2 for m=1,2, + n/2 for m=1,2,

……p……p

or or

netnetmm = n - HD( = n - HD(XX , , SS(m)(m)))

1717

Hamming Net and MAXNET(9/30)

9.9. To scale the input 0~n to 0~1 down, To scale the input 0~n to 0~1 down, one can apply transfer function as one can apply transfer function as

f(netf(netmm) = net) = netmm for m=1,2,…..p for m=1,2,…..p

1818

Hamming Net and MAXNET(10/30)

1919

Hamming Net and MAXNET(11/30)

10.10. So for the node with the the highest output So for the node with the the highest output means that the node has smallest HD means that the node has smallest HD between input and prototype vectorsbetween input and prototype vectors SS(1)(1)…………SS(m)(m) i.e.i.e.

f(netf(netmm) = 1) = 1

for other nodes for other nodes

f(netf(netmm) < 1) < 1

2020

Hamming Net and MAXNET(12/30)

11.11. MAXNET is employed as a second MAXNET is employed as a second layer only for the cases where an layer only for the cases where an enhancement of the initial dominant enhancement of the initial dominant response of response of m’m’thth node is required. node is required.

i.e.,i.e., the purpose of MAXNET is to let the purpose of MAXNET is to let max{ ymax{ y11,……y,……ypp } } equal to 1 and let equal to 1 and let

others equal to 0.others equal to 0.

2121

Hamming Net and MAXNET(13/30)

2222

Hamming Net and MAXNET(14/30)

12.12. To achieve this, one can let To achieve this, one can let

where where i=1,……p ; j=1,……p ; ji=1,……p ; j=1,……p ; j≠i≠i

∵∵yyii is bounded by is bounded by 00≦≦yyii≦≦11 for i=1,……p for i=1,……p

↗↗

output of Hamming Net.output of Hamming Net.and can only be 1 or 0.and can only be 1 or 0.

)(1 k

jj

ki

ki yyfy

)(1 k

jj

ki

ki yyfy

1kiy 1kiy

2323

Hamming Net and MAXNET(15/30)

13.13. SoSoε is bounded by 0<ε<1/ε is bounded by 0<ε<1/ppandand

εε: lateral interaction coefficient: lateral interaction coefficient)(

1

1

1

1

pp

MW

)(1

1

1

1

pp

MW

2424

Hamming Net and MAXNET(16/30)

And And k

Mk yWnet k

Mk yWnet

2525

Hamming Net and MAXNET(17/30)

14. So the transfer function 14. So the transfer function

0

0 0)(

netnet

netnetf

0

0 0)(

netnet

netnetf

2626

Hamming Net and MAXNET(18/30)

2727

Hamming Net and MAXNET(19/30)

Ex: To have a Hamming Net for Ex: To have a Hamming Net for classifying C , I , Tclassifying C , I , T

thenthen

SS(1)(1) = [ 1 1 1 1 -1 -1 1 1 1 ] = [ 1 1 1 1 -1 -1 1 1 1 ]tt

SS(2)(2) = [ -1 1 -1 -1 1 -1 -1 1 -1 ] = [ -1 1 -1 -1 1 -1 -1 1 -1 ]tt

SS(3)(3) = [ 1 1 1 -1 1 -1 -1 1 -1 ] = [ 1 1 1 -1 1 -1 -1 1 -1 ]tt

2828

Hamming Net and MAXNET(20/30)

2929

Hamming Net and MAXNET(21/30)

SoSo

111111111

111111111

111111111

HW

111111111

111111111

111111111

HW

22

1 nXWnet H

22

1 nXWnet H

3030

Hamming Net and MAXNET(22/30)

ForFor t

t

net

X

537

111111111

t

t

net

X

537

111111111

Y

netft

9

5

9

3

9

7

Y

netft

9

5

9

3

9

7 AndAnd

3131

Hamming Net and MAXNET(23/30)

Input to MAXNET and select Input to MAXNET and select =0.2 < =0.2 < 1/3(=1/p)1/3(=1/p)

SoSo

15

1

5

15

11

5

15

1

5

11

MW

15

1

5

15

11

5

15

1

5

11

MW

3232

Hamming Net and MAXNET(24/30)

AndAnd

kk

kM

k

netfY

YWnet

1 kk

kM

k

netfY

YWnet

1

3333

Hamming Net and MAXNET(25/30)

k=0k=0

333.0

067.0

599.0

333.0

067.0

599.0

555.0

333.0

777.0

12.02.0

2.012.0

2.02.01

01

0

netfY

o

net

333.0

067.0

599.0

333.0

067.0

599.0

555.0

333.0

777.0

12.02.0

2.012.0

2.02.01

01

0

netfY

o

net

3434

Hamming Net and MAXNET(27/30)

k=1k=1

t

t

Y

net

2

0

1

120.00520.0

120.0120.0520.0

t

t

Y

net

2

0

1

120.00520.0

120.0120.0520.0

3535

Hamming Net and MAXNET(28/30)

k=2k=2

t

t

Y

net

3

0

2

096.00480.0

096.014.0480.0

t

t

Y

net

3

0

2

096.00480.0

096.014.0480.0

3636

Hamming Net and MAXNET(29/30)

k=3k=3

t

t

Y

net

4

7

0

3

00461.0

10115.0461.0

t

t

Y

net

4

7

0

3

00461.0

10115.0461.0

3737

Hamming Net and MAXNET(30/30)

Summary:Summary:

Hamming network only tells which Hamming network only tells which class it will mostly like, not to class it will mostly like, not to restore distorted pattern. restore distorted pattern.

3838

Clustering—Unsupervised Learning(1/20)

1.1. IntroductionIntroductiona.a. to categorize or cluster datato categorize or cluster data b.b. ggrouping similar objects and rouping similar objects and

separating of dissimilar onesseparating of dissimilar ones

2.2. Assuming pattern set Assuming pattern set {{XX11 , , XX22 ,…… ,……XXNN} is } is submitted to determine the decision submitted to determine the decision function required to identify possible function required to identify possible clusters,clusters,=> by similarity rules=> by similarity rules

3939

Clustering—Unsupervised Learning(2/20)

cos

)()(

: DistanceEuclidean

XX

XXψ

XXXXXX

i

it

it

ii

cos

)()(

: DistanceEuclidean

XX

XXψ

XXXXXX

i

it

it

ii

4040

Clustering—Unsupervised Learning(3/20)

4141

Clustering—Unsupervised Learning(4/20)

3.3. Winner-Take-AllWinner-Take-All learning learning

Assuming input vectors are to be Assuming input vectors are to be classified into one of specific number of classified into one of specific number of pp categories according to the clusters categories according to the clusters detected in the training set detected in the training set {{XX11, , XX22 ,…… ,……

XXNN}.}.

4242

Clustering—Unsupervised Learning(5/20)

4343

Clustering—Unsupervised Learning(6/20)

Kohonen NetworkKohonen Network

YY=f(=f(WW XX))

for vector size = for vector size = nn

,p,for iwwW

W

W

W

W

inii

t

tp

t

t

1

and

1

2

1

,p,for iwwW

W

W

W

W

inii

t

tp

t

t

1

and

1

2

1

4444

Clustering—Unsupervised Learning(7/20)

prior to the learning, normalization of all prior to the learning, normalization of all weight vectors is requiredweight vectors is required

So So

i

ii W

WW

i

ii W

WW

4545

Clustering—Unsupervised Learning(8/20)

The meaning of training is to find from The meaning of training is to find from all such thatall such that

mW

mW

ipi

m WXWX,,1

min

ipi

m WXWX,,1

min

i.e., the m’th neuron with vector is the closest approximation of the current X.

4646

Clustering—Unsupervised Learning(9/20)

FromFrom

ipi

m WXWX,,1

min

ipi

m WXWX,,1

min

XWWX

and

XWXXWX

ti

pii

pi

tm

tm

,,1,,1

21

max min

12

XWWX

and

XWXXWX

ti

pii

pi

tm

tm

,,1,,1

21

max min

12

4747

Clustering—Unsupervised Learning(10/20)

SoSo

1, ,max

t t

m ii p

W X W X

1, ,max

t t

m ii p

W X W X

The The m’m’thth neuron is the winning neuron and neuron is the winning neuron and has the largest value of has the largest value of netnetii, , i=1,……pi=1,……p..

4848

Clustering—Unsupervised Learning(11/20)

Thus Thus WWmm should be adjusted such that should be adjusted such that

||||XX - - WWmm || is reduced. || is reduced.

Then , |Then , |||XX - - WWmm || can be reduced by the || can be reduced by the direction of gradient.direction of gradient.

▽▽WWmm || ||XX - - WWmm || ||22 = -2( = -2(XX - - WWmm))

Or,Or,

increase in increase in ((XX - - WWmm) direction.) direction.

4949

Clustering—Unsupervised Learning(12/20)

For any For any XX is just a single step, we want only is just a single step, we want only fraction of fraction of ((XX - - WWmm)), thus, thus

for neuron for neuron mm::

▽▽WWmm’ = ’ = α(α(XX - - WWmm)) 0.1< α <0.70.1< α <0.7

for other neurons:for other neurons:

▽▽WWii’ = 0’ = 0

5050

Clustering—Unsupervised Learning(13/20)

In general:In general:

miforWW

WXWW

k

i

k

i

k

mk

k

m

k

m

constant learning:α

neuron winning:m where)(

1

1

miforWW

WXWW

k

i

k

i

k

mk

k

m

k

m

constant learning:α

neuron winning:m where)(

1

1

5151

Clustering—Unsupervised Learning(15/20)

X

X

X

X

X

X

X

X

X

mW

mW

,p,for iXW ti 1 max

,p,for iXW ti 1 max

4.4. GeometricalGeometrical interpretationinterpretation

input vector (normalized) and input vector (normalized) and weight is winnerweight is winner

=>=>

5252

Clustering—Unsupervised Learning(16/20)

5353

Clustering—Unsupervised Learning(17/20)

So is created.

X

X

X

X

X

X

X

mWX

mWX

WXW

WWW

m

mmm

'

WXW

WWW

m

mmm

'And

5454

Clustering—Unsupervised Learning(18/20)

Thus the weight adjustment is mainly the rotation of the weight vector toward input vector without a significant length change.

And Wm’ is not a normal , so in the new training stage , Wm’ must be normalized again. is a vector point to the gravity of each cluster on a unity sphere.

X

X

X

X

X

X

X

iW

iW

5555

Clustering—Unsupervised Learning(19/20)

5. In the case of some patterns are known class,

then

X

X

X

X

X

X

X

mm WXW

mm WXW

α > 0 for correct nodeα < 0 otherwiseThis will accelerate the learning process significantly.

5656

Clustering—Unsupervised Learning(20/20)

6. Another modification is to adjust the weight for both winner and losers. leaky competitive learning

7. Recall Y= f(W X)8. Initialization of weights

Randomly choose weight from U(0,1)or convex combination

X

X

X

X

X

X

X

piforn

W ti ,,1 11

10 piforn

W ti ,,1 11

10

5757

Feature Map(1/6)1. Transform from high dimensional pattern

space to low-dimension feature space .2. Feature extraction:

two category: natural structureno natural structure

-- depend on pattern is similar to human perception or not

X

X

X

X

X

X

X

5858

Feature Map(2/6)

X

X

X

X

X

X

X

5959

Feature Map(3/6)

3. another important aspect is to represent the feature as natural as possible.

4. Self organizing neural array that can map features from pattern space.

X

X

X

X

X

X

X

6060

Feature Map(4/6)5. Use one dimensional array or

Two dimensional array6. Example:

X: input vectori: neuron i

wij:weight from input xj to neuron i

yi: output of neuron i

X

X

X

X

X

X

X

6161

Feature Map(5/6)

X

X

X

X

X

X

X

6262

Feature Map(6/6)

7. So define

X

X

X

X

X

X

X

ii

ii

WXWXSwhere

WXSfy

and between similarity themeans ),(

)),((

ii

ii

WXWXSwhere

WXSfy

and between similarity themeans ),(

)),((

8. Thus, (b) is the result of (a)

6363

Self-organizing Feature Map(1/6)1. One dimensional mapping

set of pattern Xi, i=1,2,….

use a linear array i1> i2> i3>…..

if XXXXXXX

,2,1),(max

,2,1),(max

2

1

2

1

iXyy

iXyy

ii

ii

,2,1),(max

,2,1),(max

2

1

2

1

iXyy

iXyy

ii

ii

6464

Self-organizing Feature Map(2/6)2. Thus, this learning is to find the

best matching neuron cells whch can activate their spatial neighbors to react to the same input.

3. Or to find Wc such that

X

X

X

X

X

X

X

ii

c WXWX min ii

c WXWX min

6565

Self-organizing Feature Map(3/6)4. In case of two winners, choose

lower i.

5. If c is the winning neuron then define Nc as the neighborhood around c and Nc is always changing as learning going.

6. Then

X

X

X

X

X

X

X

otherwise 0

odneighborho in the is if )( cijjij

Niwxw

otherwise 0

odneighborho in the is if )( cijjij

Niwxw

6666

Self-organizing Feature Map(4/6)7. :

where t: current training iterationT: total # of training steps to

be done.

8. starts from 0 and is decreased until it reaches value of 0

X

X

X

X

X

X

X

T

tt 10

T

tt 10

6767

Self-organizing Feature Map(5/6)9. If rectangular neighborhood is used

thenc-d < x < c+dc-d < y < c+d

and as learning continues

d is decreased from d0 to 1

X

X

X

X

X

X

X

T

tdd 10

T

tdd 10

6868

Self-organizing Feature Map-example

10. Example:a. Input patterns are chosen randomly from U[0,1]. b. The data employed in the experiment comprised 500 points distributed uniformly over the bipolar square [−1,1] × [−1, 1] c. The points thus describe a geometrically square topology.d. Thus, initial weights can be plotted as in next

figure.d.“ -” connection line connects two

adjacent neuron in competitive layer.

X

X

X

X

X

X

X

6969

Self-organizing Feature Map-example

X

X

X

X

X

X

X

7070

Self-organizing Feature Map-- Example

X

X

X

X

X

X

X

7171

Self-organizing Feature Map—Example

X

X

X

X

X

X

X

7272

Self-organizing Feature MapExample

X

X

X

X

X

X

X

top related