neural networks part 3 dan simon cleveland state university 1

38
Neural Networks Part 3 Dan Simon Cleveland State University 1

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Neural Networks Part 3 Dan Simon Cleveland State University 1

Neural NetworksPart 3

Dan SimonCleveland State University

1

Page 2: Neural Networks Part 3 Dan Simon Cleveland State University 1

Outline

1. Sugeno RBF Neurofuzzy Networks2. Cardiology Application3. Hopfield Networks4. Kohonen Self Organizing Maps5. Adaptive Neuro-Fuzzy Inference Systems

(ANFIS)

2

Page 3: Neural Networks Part 3 Dan Simon Cleveland State University 1

3

Sugeno RBF 1  1 :  If   is   and   and   is   then  ,   1, ,i i m im iR x A x A y z x i p

Sugeno fuzzy system; p fuzzy rules, scalar output

1

1

( )p

ip

i

i i

i

w zy

w

x

Defuzzified output (centroid defuzzification)Summation over all p fuzzy ruleswi = firing strength of i-th rule(Chapter 4)

Suppose we use product inference. Then:

1

( )m

i ik kk

w x

x1 x2

i1(x1)i2(x2)

wi

Page 4: Neural Networks Part 3 Dan Simon Cleveland State University 1

4

1

1

1

)

( )

(p

ip

i i

i

m

i k

i

ikk

x

w z xy

w

w

Suppose the outputs are singletons (zero-order Sugeno system). Then zi(x) = zi and:

1 1

11

( )

( )

m

i ik k

p

i km

ik kk

p

i

x

x

zy

Page 5: Neural Networks Part 3 Dan Simon Cleveland State University 1

5

2 2

1

2 2 2 21 1

1

1

1

2

1

1

1

1

2

( ) exp ( ) /

( ) ( )

exp ( ) / ( ) /

exp ( ) ) , where

( ( ), , ( )

diag( , , )

( )

( )

(

)

ik k ik ik

m

i ikk

i i im im

Ti i i

Ti i m im

i i im

m

p

i i

i

ip

i

x x c

m x x

x c x c

x c P

x c x c x c

P

x

x

c

my

m

x

z

Suppose the input MFs are Gaussian. Then:

Recall the RBF network:y = wi f (x, ci)

= wi ( ||xci|| ) (.) is a basis function{ ci } are the RBF centers

Page 6: Neural Networks Part 3 Dan Simon Cleveland State University 1

6

x1

xm

y

w1

w2

m1(x)

mp(x)

x2

m2(x)

wp

1

( )

ii p

kk

zw

m x

We started with a Sugeno fuzzy systemand ended up with an RBF networkthat has input-dependent weights wi.This is a neuro-fuzzy network.

Page 7: Neural Networks Part 3 Dan Simon Cleveland State University 1

7

cik and ik : p mzi : pA total of p(2m+1) adjustable parametersm = input dimension, p = number of hidden layersGradient descent or BBO

Chen and Linkensexample : y = x2 sin(x1) + x1 cos(x2)

NeuroFuzzy.zip / BBO.mp = 4

0 10 20 30 40 500

20

40

60

80

100

120

140

Generation

Tra

inin

g E

rro

r (R

MS

)

Page 8: Neural Networks Part 3 Dan Simon Cleveland State University 1

0

0.5

1

1.5

2

2.5

3

3.5

0

0.5

1

1.5

2

2.5

3

3.5

-4

-2

0

2

4

Target

0

0.5

1

1.5

2

2.5

3

3.5

0

0.5

1

1.5

2

2.5

3

3.5

-3

-2

-1

0

1

2

3

Neurofuzzy Approximation

Target Neurofuzzy Approximation

6,000 BBO generations, RMS error = 0.6We can also use gradient descent training

8

Page 9: Neural Networks Part 3 Dan Simon Cleveland State University 1

Neurofuzzy Diagnosis of Heart Disease

• Cardiovascular disease is the leading cause of death in the western world– Over 800,000 deaths per year in the United States– One in five Americans has cardiovascular disease

• Cardiomyopathy: weakening of the heart muscle• Could be inherited or acquired (unknown cause)• Biochemical considerations indicate that cardiomyopathy will

affect the P wave of an ECG

9

Page 10: Neural Networks Part 3 Dan Simon Cleveland State University 1

Neurofuzzy Diagnosis of Heart Disease

Cardiologists tell us that primary indicators include:•P wave duration•P wave amplitude•P wave energy•P wave inflection point This gives us a neurofuzzy system with four inputs.

10

Page 11: Neural Networks Part 3 Dan Simon Cleveland State University 1

Neurofuzzy Diagnosis of Heart Disease

• ECG data collection– Data collected for 24 hours– Average P wave data calculated each minute

• Duration• Inflection• Energy• Amplitude

– 37 cardiomyopathy patients, 18 control patients

11

Page 12: Neural Networks Part 3 Dan Simon Cleveland State University 1

Neurofuzzy Diagnosis of Heart Disease

duration inflection energy amplitude 0.7

0.8

0.9

1

1.1

1.2

CardiomyopathyControlNormalized P

wave features with 1- bars.

Data is complex due to its time-varying nature.

12

Page 13: Neural Networks Part 3 Dan Simon Cleveland State University 1

Neurofuzzy Diagnosis of Heart Disease

p Training Error Training CCR Testing CCRBest Mean Best Mean Best Mean

2 0.85 0.88 76 72 66 583 0.77 0.84 82 77 75 624 0.78 0.83 84 77 65 555 0.78 0.83 82 76 63 58

BBO training error and correct classification rate (CCR) percent as a function of the number of middle layer

neurons p. What about statistical significance?

13

Page 14: Neural Networks Part 3 Dan Simon Cleveland State University 1

Neurofuzzy Diagnosis of Heart Disease

Mutationrate (%)

Training Error Training CCR Testing CCRBest Mean Best Mean Best Mean

0.1 0.79 0.85 81 76 71 610.2 0.82 0.86 80 75 72 590.5 0.77 0.85 82 76 69 621.0 0.80 0.85 80 74 67 572.0 0.83 0.86 79 74 69 625.0 0.82 0.87 81 74 68 58

10.0 0.80 0.87 78 73 65 59Training error and correct classification rate (CCR)

percent for different mutation rates using BBO (p = 3).

14

Page 15: Neural Networks Part 3 Dan Simon Cleveland State University 1

Neurofuzzy Diagnosis of Heart Disease

Typical BBO training and test results

0 10 20 30 40 500.8

1

1.2

1.4

Tra

inin

g E

rror

Average CostMinimum Cost

0 10 20 30 40 5040

60

80

Generation

Suc

cess

Rat

e (%

)

Training

Test

15

Page 16: Neural Networks Part 3 Dan Simon Cleveland State University 1

Neurofuzzy Diagnosis of Heart Disease

Patient number

Per

cent

cor

rect Success varies

from one patient to the next. Does demographic information needs to be included in the classifier?

16

Page 17: Neural Networks Part 3 Dan Simon Cleveland State University 1

The Discrete Hopfield Net

• John Hopfield, molecular biologist, 1982• Proc. of the National Academy of Sciences• Autoassociative network: recall a stored

pattern similar to the input pattern• Number of neurons = pattern dimension• Fully connected network except wii = 0

• Symmetric connections: wik = wki

• Stability proof17

Page 18: Neural Networks Part 3 Dan Simon Cleveland State University 1

The Discrete Hopfield Net• The neuron signals comprise

an output pattern.• The neuron signals are

initially set equal to some input pattern.

• The network converges to the nearest stored pattern.

Example: Store [1, 0, 1], [1, 1, 0], and [0, 0, 1]• Input [0.9, 0.4, 0.6]• Network converges to [1, 0, 1]

18

Page 19: Neural Networks Part 3 Dan Simon Cleveland State University 1

Store P binary patterns, each with n dimensions:s(p) = [s1(p), …, sn(p)], p = 1, …, P

1

2 ( ) 1 2 ( ) 1 ,

0

P

ik i kp

ii

s s p

w

w p i k

Suppose the neuron signals are given byy = [s1(q), …, sn(q)]When these signals are updated by the network, they are updated to

1 11

( ) 2 ( ) 1 2 ( ) 1 ( )nn P

i ik k i k kk pk

y w s q s p s p s q

19

Page 20: Neural Networks Part 3 Dan Simon Cleveland State University 1

1 1 1

( ) 2 ( ) 1 2 ( ) 1 ( )nn P

i ik k i k kk p k

y w s q s p s p s q

Recall si [0, 1]. Therefore, the average value of the term in brackets is 0, unless q = p, in which case the average value is n / 2.Therefore, we adjust the neuron signals as:

if

( ) if

1

0

if

i i

i i i i i

i i

y

f yy y y

y

where i = threshold. This results in s(p) being a stable network pattern. (We still have not proven convergence.)

One neuron update at a time

20

Page 21: Neural Networks Part 3 Dan Simon Cleveland State University 1

Binary Hopfield Net Example:Two patterns, p = 1 and p = 2, so P = 2s(1) = [1, 0, 1, 0], s(2) = [1, 1, 1, 1]

0 0 2 0

0 0 0 2

2 0 0 0

0 2 0 0

w

1

2 ( ) 1 2 ( ) 1 ,

0

2 (1) 1 [1, 1,1, 1], 2 (2) 1 [1,1,1,1]

P

ik i kp

ii

s p s p i k

w

s

w

s

21

Page 22: Neural Networks Part 3 Dan Simon Cleveland State University 1

Input y = [1 0 1 1] – close to s(2) = [1, 1, 1, 1]

1

1

1

41

1

2

3

, so [ ,0,1,1]

, so [ , ,1,1]

, so [ ,1, ,1

(2) 1

(2) 1 1

1

1

1

1

]

, so [ ,1,1,

(2) 1 1

(2) 1 1 ]

n

ik kk

n

ik kk

n

ik kk

n

ik kk

y f w y y

y f w y y

y f w

f

f

fy y

y f y yfw

0 0 2 0

0 0 0 2

2 0 0 0

0 2 0 0

w

Threshold i = 1

Convergence22

Page 23: Neural Networks Part 3 Dan Simon Cleveland State University 1

0 0 2 0

0 0 0 2

2 0 0 0

0 2 0 0

w

Recall s(1) = [1, 0, 1, 0], s(2) = [1, 1, 1, 1]

Is s(1) stable?Is s(2) stable?Are any other patterns stable?

1

n

kk

i iky f w y

Storage capacity:P = 0.15n (experimental)P = n / (2 log2 n)

23

Page 24: Neural Networks Part 3 Dan Simon Cleveland State University 1

Hopfield Net Stability:Consider the “energy” function

1 1

1

2

n n

i k ik i ii k i i

y y wE y

Is E bounded?How does E change when yi changes?

1

n

k ik i ik

y w yE

24

Page 25: Neural Networks Part 3 Dan Simon Cleveland State University 1

1

1 1 1

1

if

if

f

1

0 i

n

ik k ik

n n n

i ik k ik k ik k ik k k

n

ik k ik

y

w y

f w y w y w y

w y

Recall our activation function:

If yi = 1, it will decrease if wikyk < i

This gives a negative change for E (see prev. page)If yi = 0, it will increase if wikyk > i

This gives a negative change for E (see prev. page)We have a bounded E which never increases.E is a Lyapunov function.

25

Page 26: Neural Networks Part 3 Dan Simon Cleveland State University 1

Control Applications of Hopfield Nets

• If we have optimal control trajectories, and noise drives us away from the optimal trajectory, the Hopfield net can find the closest optimal trajectory

• Transform a linear-quadratic optimal control performance index into the form of the Hopfield network energy function. Use the Hopfield network dynamics to minimize the energy.

26

Page 27: Neural Networks Part 3 Dan Simon Cleveland State University 1

Kohonen Self Organizing MapClustering; associative memory – given a set of input vectors { x }, find a mapping from the input vectors onto a grid of models { m } (cluster centers)Nearby models are similarVisualize vector distributionsTuevo Kohonen, engineer,Finland, 1982Unsupervised learning

27

Page 28: Neural Networks Part 3 Dan Simon Cleveland State University 1

All the weights from the n input dimensions to a given point in output space correspond to a cluster point.Note: The inputs are not multiplied by the weights, unless the inputs are normalized – then maxk[x.w(k)] gives the cluster point that is closest to x, because the dot product of x and w(k) is the cosine of the angle between them.

28

Page 29: Neural Networks Part 3 Dan Simon Cleveland State University 1

Kohonen SOM Learning AlgorithmWe are given a set of input vectors {x}, each of dimension nChoose the maximum number of clusters mRandom weight initialization {wik}, i[1,n], k[1,m]Note wik is the weight from xi to cluster unit kIterate for each input training sample x:

2

1

For each , compute ( ) ( )n

ik ii

k D k w x

Find k such that D(k) D(k’) for all k’scalar form: wik wik + (xi wik), for i[1,n]n-dimensional vector form: wk wk + (x wk) = some function that decreases with the distance between x and wk, and decreases with time (# of iterations). This update equation moves the wk vector closer to x. 29

Page 30: Neural Networks Part 3 Dan Simon Cleveland State University 1

Kohonen SOM Example

Cluster [1, 1, 0, 0]; [0, 0, 0, 1]; [1, 0, 0, 0]; [0, 0, 1, 1]Maximum # of clusters m = 2(t) = (0.6)(0.95)t, where t = iteration #(coarse clustering to start, fine-tuning later)

Random initialization:

0.2 0.8

0.6 0.4

0.5 0.7

0.9 0.3

w

First vector: D(1) = 1.86, D(2) = 0.98w2 w2 + 0.6(x w2) = [0.92, 0.76, 0.28, 0.12]T

Second vector: D(1) = 0.66, D(2) = 2.28w1 w1 + 0.6(x w1) = [0.08, 0.24, 0.20, 0.96]T

30

Page 31: Neural Networks Part 3 Dan Simon Cleveland State University 1

Third vector: D(1) = 1.87, D(2) = 0.68w2 w2 + 0.6(x w2) = [0.97, 0.30, 0.11, 0.05]T

Fourth vector: D(1) = 0.71, D(2) = 2.72w1 w1 + 0.6(x w1) = [0.03, 0.10, 0.68, 0.98]T

This is the end of the first iteration (epoch)

0.0068 0.0203 0.6839 0.9966(2)

0.9932 0.3127 0.0237 0.0102

T

w

0.0320 0.0960 0.6800 0.9840(1)

0.9680 0.3040 0.1120 0.0480

T

w

Adjust for the next iteration

0.0000 0.0000 0.5144 1.0000(50)

1.0000 0.4856 0.0000 0.0000

T

w

Each cluster point (weight column) is converging to about the average of the two sample inputs that are closest to it.

Kohonen.m

31

Page 32: Neural Networks Part 3 Dan Simon Cleveland State University 1

Control Applications of Kohonen NetworksFault accomodation

• Suppose we have a family of controllers, one controller for each fault condition

• When a fault occurs, classify it in the correct fault class to choose the control

• This idea can also apply to operating modes, reference input types, user intent, etc.

Missing sensor data – the Konohen network can fill in the most likely values of missing sensor data

32

Page 33: Neural Networks Part 3 Dan Simon Cleveland State University 1

Adaptive Neuro-Fuzzy Inference Systems

• Originally called adaptive network-based fuzzy inference systems

• Roger Jang, 1993 (Zadeh’s student)

33

Page 34: Neural Networks Part 3 Dan Simon Cleveland State University 1

Figure 12.1(b) in Jang’s bookTwo-input, single-output ANFIS

Layer 1: Fuzzy system; outputs = membership gradesLayer 2: ProductLayer 3: NormalizationLayer 4: Sugeno fuzzy systemLayer 5: Sum

34

Page 35: Neural Networks Part 3 Dan Simon Cleveland State University 1

Layer 1 outputs: A1(x), A2(x), B1(y), B2(y)

Layer 2 outputs: w1 = A1(x)B1(y), w2 = A2(x)B2(y)

Layer 3 outputs:

Layer 4 outputs:

Layer 5 output:

1 1 1 2 2 2 1 2/ ( ), / ( )w w w w w w w w

( )i i i i i iw f w p x q y r 2 2

1 1 2 21 1

i i ii i

f w f w f w f w

35

(or any other T-norm)

Page 36: Neural Networks Part 3 Dan Simon Cleveland State University 1

36

So ANFIS is a Sugeno fuzzy system.•Neural network architecture•It can be trained with neural network methods (e.g., backpropagation).

1 1 1 1 2 2 2 2

1 1 1 1 1 1 2 2 2 2 2 2

( ) )

( ) ( ) ( ) ( ) ( ) )

(

(

f w p x q w y r

w x p w y q w r w x p w y q w

r p x q

r

y

• Consequent parameters = pi, qi, and ri. • Output is linear with respect to these parameters. • We can optimize with respect to the consequent

parameters using least-squares. • This is called the forward pass. • 1st-order Sugeno system with n inputs and m fuzzy

Sugeno partitions per input 3mn linear parameters.

Page 37: Neural Networks Part 3 Dan Simon Cleveland State University 1

37

• Premise parameters = parameters of fuzzy sets A1, A2, B1, B2, etc.

• ANFIS output is nonlinear with respect to these parameters.

• Gradient descent can be used to optimize the output with respect to these parameters.

• This is called the backward pass.• Premise fuzzy system with n inputs, q fuzzy partitions

per input, and k parameters per MF kqn nonlinear parameters.

Page 38: Neural Networks Part 3 Dan Simon Cleveland State University 1

References•M. Chen and D. Linkens, A systematic neuro-fuzzy modelling framework with application to material property prediction, IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 31, no. 5 (2001), pp. 781-790•M. Ovreiu and D. Simon, Biogeography-Based Optimization of Neuro-Fuzzy System Parameters for Diagnosis of Cardiac Disease, Genetic and Evolutionary Computation Conference, Portland, Oregon, pp. 1235-1242, July 2010•J. Hopfield, Neural Networks and Physical Systems with Emergent Collective Computational Abilities, 1982•P. Simpson, Artificial Neural Systems, Pergamon Press, 1990•L. Fausett, Fundamentals of Neural Networks, Prentice Hall•www.scholarpedia.org/article/Kohonen_network•J.-S. Jang, C.-T. Sun, and E. Mizutani, Neuro-Fuzzy and Soft Computing, Prentice Hall, 1997

38