neural networks part 3 dan simon cleveland state university 1

Neural NetworksPart 3

Dan SimonCleveland State University

1

Outline

1. Sugeno RBF Neurofuzzy Networks2. Cardiology Application3. Hopfield Networks4. Kohonen Self Organizing Maps5. Adaptive Neuro-Fuzzy Inference Systems

(ANFIS)

2

3

Sugeno RBF 1 1 : If is and and is then , 1, ,i i m im iR x A x A y z x i p

Sugeno fuzzy system; p fuzzy rules, scalar output

1

1

( )p

ip

i

i i

i

w zy

w

x

Defuzzified output (centroid defuzzification)Summation over all p fuzzy ruleswi = firing strength of i-th rule(Chapter 4)

Suppose we use product inference. Then:

1

( )m

i ik kk

w x

x1 x2

i1(x1)i2(x2)

wi

4

1

1

1

)

( )

(p

ip

i i

i

m

i k

i

ikk

x

w z xy

w

w

Suppose the outputs are singletons (zero-order Sugeno system). Then zi(x) = zi and:

1 1

11

( )

( )

m

i ik k

p

i km

ik kk

p

i

x

x

zy

5

2 2

1

2 2 2 21 1

1

1

1

2

1

1

1

1

2

( ) exp ( ) /

( ) ( )

exp ( ) / ( ) /

exp ( ) ) , where

( ( ), , ( )

diag( , , )

( )

( )

(

)

ik k ik ik

m

i ikk

i i im im

Ti i i

Ti i m im

i i im

m

p

i i

i

ip

i

x x c

m x x

x c x c

x c P

x c x c x c

P

x

x

c

my

m

x

z

Suppose the input MFs are Gaussian. Then:

Recall the RBF network:y = wi f (x, ci)

= wi ( ||xci|| ) (.) is a basis function{ ci } are the RBF centers

6

x1

xm

y

w1

w2

m1(x)

…

mp(x)

x2

…

m2(x)

wp

1

( )

ii p

kk

zw

m x

We started with a Sugeno fuzzy systemand ended up with an RBF networkthat has input-dependent weights wi.This is a neuro-fuzzy network.

7

cik and ik : p mzi : pA total of p(2m+1) adjustable parametersm = input dimension, p = number of hidden layersGradient descent or BBO

Chen and Linkensexample : y = x2 sin(x1) + x1 cos(x2)

NeuroFuzzy.zip / BBO.mp = 4

0 10 20 30 40 500

20

40

60

80

100

120

140

Generation

Tra

inin

g E

rro

r (R

MS

)

0

0.5

1

1.5

2

2.5

3

3.5

0

0.5

1

1.5

2

2.5

3

3.5

-4

-2

0

2

4

Target

0

0.5

1

1.5

2

2.5

3

3.5

0

0.5

1

1.5

2

2.5

3

3.5

-3

-2

-1

0

1

2

3

Neurofuzzy Approximation

Target Neurofuzzy Approximation

6,000 BBO generations, RMS error = 0.6We can also use gradient descent training

8

Neurofuzzy Diagnosis of Heart Disease

• Cardiovascular disease is the leading cause of death in the western world– Over 800,000 deaths per year in the United States– One in five Americans has cardiovascular disease

• Cardiomyopathy: weakening of the heart muscle• Could be inherited or acquired (unknown cause)• Biochemical considerations indicate that cardiomyopathy will

affect the P wave of an ECG

9


Cardiologists tell us that primary indicators include:•P wave duration•P wave amplitude•P wave energy•P wave inflection point This gives us a neurofuzzy system with four inputs.

10


• ECG data collection– Data collected for 24 hours– Average P wave data calculated each minute

• Duration• Inflection• Energy• Amplitude

– 37 cardiomyopathy patients, 18 control patients

11


duration inflection energy amplitude 0.7

0.8

0.9

1

1.1

1.2

CardiomyopathyControlNormalized P

wave features with 1- bars.

Data is complex due to its time-varying nature.

12


p Training Error Training CCR Testing CCRBest Mean Best Mean Best Mean

2 0.85 0.88 76 72 66 583 0.77 0.84 82 77 75 624 0.78 0.83 84 77 65 555 0.78 0.83 82 76 63 58

BBO training error and correct classification rate (CCR) percent as a function of the number of middle layer

neurons p. What about statistical significance?

13


Mutationrate (%)

Training Error Training CCR Testing CCRBest Mean Best Mean Best Mean

0.1 0.79 0.85 81 76 71 610.2 0.82 0.86 80 75 72 590.5 0.77 0.85 82 76 69 621.0 0.80 0.85 80 74 67 572.0 0.83 0.86 79 74 69 625.0 0.82 0.87 81 74 68 58

10.0 0.80 0.87 78 73 65 59Training error and correct classification rate (CCR)

percent for different mutation rates using BBO (p = 3).

14


Typical BBO training and test results

0 10 20 30 40 500.8

1

1.2

1.4

Tra

inin

g E

rror

Average CostMinimum Cost

0 10 20 30 40 5040

60

80

Generation

Suc

cess

Rat

e (%

)

Training

Test

15


Patient number

Per

cent

cor

rect Success varies

from one patient to the next. Does demographic information needs to be included in the classifier?

16

The Discrete Hopfield Net

• John Hopfield, molecular biologist, 1982• Proc. of the National Academy of Sciences• Autoassociative network: recall a stored

pattern similar to the input pattern• Number of neurons = pattern dimension• Fully connected network except wii = 0

• Symmetric connections: wik = wki

• Stability proof17

The Discrete Hopfield Net• The neuron signals comprise

an output pattern.• The neuron signals are

initially set equal to some input pattern.

• The network converges to the nearest stored pattern.

Example: Store [1, 0, 1], [1, 1, 0], and [0, 0, 1]• Input [0.9, 0.4, 0.6]• Network converges to [1, 0, 1]

18

Store P binary patterns, each with n dimensions:s(p) = [s1(p), …, sn(p)], p = 1, …, P

1

2 ( ) 1 2 ( ) 1 ,

0

P

ik i kp

ii

s s p

w

w p i k

Suppose the neuron signals are given byy = [s1(q), …, sn(q)]When these signals are updated by the network, they are updated to

1 11

( ) 2 ( ) 1 2 ( ) 1 ( )nn P

i ik k i k kk pk

y w s q s p s p s q

19

1 1 1

( ) 2 ( ) 1 2 ( ) 1 ( )nn P

i ik k i k kk p k

y w s q s p s p s q

Recall si [0, 1]. Therefore, the average value of the term in brackets is 0, unless q = p, in which case the average value is n / 2.Therefore, we adjust the neuron signals as:

if

( ) if

1

0

if

i i

i i i i i

i i

y

f yy y y

y

where i = threshold. This results in s(p) being a stable network pattern. (We still have not proven convergence.)

One neuron update at a time

20

Binary Hopfield Net Example:Two patterns, p = 1 and p = 2, so P = 2s(1) = [1, 0, 1, 0], s(2) = [1, 1, 1, 1]

0 0 2 0

0 0 0 2

2 0 0 0

0 2 0 0

w

1

2 ( ) 1 2 ( ) 1 ,

0

2 (1) 1 [1, 1,1, 1], 2 (2) 1 [1,1,1,1]

P

ik i kp

ii

s p s p i k

w

s

w

s

21

Input y = [1 0 1 1] – close to s(2) = [1, 1, 1, 1]

1

1

1

41

1

2

3

, so [ ,0,1,1]

, so [ , ,1,1]

, so [ ,1, ,1

(2) 1

(2) 1 1

1

1

1

1

]

, so [ ,1,1,

(2) 1 1

(2) 1 1 ]

n

ik kk

n

ik kk

n

ik kk

n

ik kk

y f w y y

y f w y y

y f w

f

f

fy y

y f y yfw

0 0 2 0

0 0 0 2

2 0 0 0

0 2 0 0

w

Threshold i = 1

Convergence22

0 0 2 0

0 0 0 2

2 0 0 0

0 2 0 0

w

Recall s(1) = [1, 0, 1, 0], s(2) = [1, 1, 1, 1]

Is s(1) stable?Is s(2) stable?Are any other patterns stable?

1

n

kk

i iky f w y

Storage capacity:P = 0.15n (experimental)P = n / (2 log2 n)

23

Hopfield Net Stability:Consider the “energy” function

1 1

1

2

n n

i k ik i ii k i i

y y wE y

Is E bounded?How does E change when yi changes?

1

n

k ik i ik

y w yE

24

1

1 1 1

1

if

if

f

1

0 i

n

ik k ik

n n n

i ik k ik k ik k ik k k

n

ik k ik

y

w y

f w y w y w y

w y

Recall our activation function:

If yi = 1, it will decrease if wikyk < i

This gives a negative change for E (see prev. page)If yi = 0, it will increase if wikyk > i

This gives a negative change for E (see prev. page)We have a bounded E which never increases.E is a Lyapunov function.

25

Control Applications of Hopfield Nets

• If we have optimal control trajectories, and noise drives us away from the optimal trajectory, the Hopfield net can find the closest optimal trajectory

• Transform a linear-quadratic optimal control performance index into the form of the Hopfield network energy function. Use the Hopfield network dynamics to minimize the energy.

26

Kohonen Self Organizing MapClustering; associative memory – given a set of input vectors { x }, find a mapping from the input vectors onto a grid of models { m } (cluster centers)Nearby models are similarVisualize vector distributionsTuevo Kohonen, engineer,Finland, 1982Unsupervised learning

27

All the weights from the n input dimensions to a given point in output space correspond to a cluster point.Note: The inputs are not multiplied by the weights, unless the inputs are normalized – then maxk[x.w(k)] gives the cluster point that is closest to x, because the dot product of x and w(k) is the cosine of the angle between them.

28

Kohonen SOM Learning AlgorithmWe are given a set of input vectors {x}, each of dimension nChoose the maximum number of clusters mRandom weight initialization {wik}, i[1,n], k[1,m]Note wik is the weight from xi to cluster unit kIterate for each input training sample x:

2

1

For each , compute ( ) ( )n

ik ii

k D k w x

Find k such that D(k) D(k’) for all k’scalar form: wik wik + (xi wik), for i[1,n]n-dimensional vector form: wk wk + (x wk) = some function that decreases with the distance between x and wk, and decreases with time (# of iterations). This update equation moves the wk vector closer to x. 29

Kohonen SOM Example

Cluster [1, 1, 0, 0]; [0, 0, 0, 1]; [1, 0, 0, 0]; [0, 0, 1, 1]Maximum # of clusters m = 2(t) = (0.6)(0.95)t, where t = iteration #(coarse clustering to start, fine-tuning later)

Random initialization:

0.2 0.8

0.6 0.4

0.5 0.7

0.9 0.3

w

First vector: D(1) = 1.86, D(2) = 0.98w2 w2 + 0.6(x w2) = [0.92, 0.76, 0.28, 0.12]T

Second vector: D(1) = 0.66, D(2) = 2.28w1 w1 + 0.6(x w1) = [0.08, 0.24, 0.20, 0.96]T

30

Third vector: D(1) = 1.87, D(2) = 0.68w2 w2 + 0.6(x w2) = [0.97, 0.30, 0.11, 0.05]T

Fourth vector: D(1) = 0.71, D(2) = 2.72w1 w1 + 0.6(x w1) = [0.03, 0.10, 0.68, 0.98]T

This is the end of the first iteration (epoch)

0.0068 0.0203 0.6839 0.9966(2)

0.9932 0.3127 0.0237 0.0102

T

w

0.0320 0.0960 0.6800 0.9840(1)

0.9680 0.3040 0.1120 0.0480

T

w

Adjust for the next iteration

0.0000 0.0000 0.5144 1.0000(50)

1.0000 0.4856 0.0000 0.0000

T

w

Each cluster point (weight column) is converging to about the average of the two sample inputs that are closest to it.

Kohonen.m

31

Control Applications of Kohonen NetworksFault accomodation

• Suppose we have a family of controllers, one controller for each fault condition

• When a fault occurs, classify it in the correct fault class to choose the control

• This idea can also apply to operating modes, reference input types, user intent, etc.

Missing sensor data – the Konohen network can fill in the most likely values of missing sensor data

32

Adaptive Neuro-Fuzzy Inference Systems

• Originally called adaptive network-based fuzzy inference systems

• Roger Jang, 1993 (Zadeh’s student)

33

Figure 12.1(b) in Jang’s bookTwo-input, single-output ANFIS

Layer 1: Fuzzy system; outputs = membership gradesLayer 2: ProductLayer 3: NormalizationLayer 4: Sugeno fuzzy systemLayer 5: Sum

34

Layer 1 outputs: A1(x), A2(x), B1(y), B2(y)

Layer 2 outputs: w1 = A1(x)B1(y), w2 = A2(x)B2(y)

Layer 3 outputs:

Layer 4 outputs:

Layer 5 output:

1 1 1 2 2 2 1 2/ ( ), / ( )w w w w w w w w

( )i i i i i iw f w p x q y r 2 2

1 1 2 21 1

i i ii i

f w f w f w f w

35

(or any other T-norm)

36

So ANFIS is a Sugeno fuzzy system.•Neural network architecture•It can be trained with neural network methods (e.g., backpropagation).

1 1 1 1 2 2 2 2

1 1 1 1 1 1 2 2 2 2 2 2

( ) )

( ) ( ) ( ) ( ) ( ) )

(

(

f w p x q w y r

w x p w y q w r w x p w y q w

r p x q

r

y

• Consequent parameters = pi, qi, and ri. • Output is linear with respect to these parameters. • We can optimize with respect to the consequent

parameters using least-squares. • This is called the forward pass. • 1st-order Sugeno system with n inputs and m fuzzy

Sugeno partitions per input 3mn linear parameters.

37

• Premise parameters = parameters of fuzzy sets A1, A2, B1, B2, etc.

• ANFIS output is nonlinear with respect to these parameters.

• Gradient descent can be used to optimize the output with respect to these parameters.

• This is called the backward pass.• Premise fuzzy system with n inputs, q fuzzy partitions

per input, and k parameters per MF kqn nonlinear parameters.

References•M. Chen and D. Linkens, A systematic neuro-fuzzy modelling framework with application to material property prediction, IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 31, no. 5 (2001), pp. 781-790•M. Ovreiu and D. Simon, Biogeography-Based Optimization of Neuro-Fuzzy System Parameters for Diagnosis of Cardiac Disease, Genetic and Evolutionary Computation Conference, Portland, Oregon, pp. 1235-1242, July 2010•J. Hopfield, Neural Networks and Physical Systems with Emergent Collective Computational Abilities, 1982•P. Simpson, Artificial Neural Systems, Pergamon Press, 1990•L. Fausett, Fundamentals of Neural Networks, Prentice Hall•www.scholarpedia.org/article/Kohonen_network•J.-S. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice Hall, 1997

38

http://www.scholarpedia.org/article/Kohonen_network

neural networks part 3 dan simon cleveland state university 1

Documents

p x x2x2

p fuzzy rules w i

w i x c i

neurofuzzy system

bbo training error

hours average p wave

sugeno rbf neurofuzzy

w i f x