ministry of education and science of ukraine sumy state...

54
Ministry of Education and Science of Ukraine Sumy State University 4324 METHODOLOGICAL INSTRUCTIONS for practical training in “Modelling of Neural Networksfor students of the speciality 8.04030101 Applied Mathematics’’ Qualification Master level Full-time training Sumy Sumy State University 2017

Upload: others

Post on 17-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

Ministry of Education and Science of Ukraine

Sumy State University

4324 METHODOLOGICAL INSTRUCTIONS

for practical training

in “Modelling of Neural Networks”

for students of the speciality

8.04030101 “Applied Mathematics’’

Qualification Master level

Full-time training

Sumy

Sumy State University

2017

Page 2: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

Methodological instructions for practical training in

“Modelling of Neural Networks” / compiler I. A. Knyaz’ – Sumy:

Sumy State University, 2017. – 54 p.

Department of Applied Mathematics and Complex Systems

Modelling

Page 3: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

3

CONTENTS

P.

DESIGNING AND TRAINING A PERCEPTRON ....................................... 5

The Perceptron Training Rule ............................................................... 6 Gradient descent and the delta rule .................................................... 7 Creating and training a perceptron (C++) ........................................ 8 Creating and training perceptron with the NNTool .................. 10

USING MATLAB FOR CLASSIFICATION OF LINEARLY SEPARABLE DATA ........................................................................................ 14

Classification of a 2-class problem with a perceptron ............. 14 Classification of a 4-class problem with a perceptron ............. 15 Prepare inputs & outputs for perceptron training ..................... 16 Creation and training perceptron ..................................................... 16

APPROXIMATION OF FUNCTIONS BY NEURAL NETWORKS ...... 18

Data Preparation ..................................................................................... 18 Network Design ....................................................................................... 20 Network Training .................................................................................... 21 Network Testing ...................................................................................... 25 Conclusion .................................................................................................. 27

FUNCTION APPROXIMATION WITH RBFN ........................................ 28

Structure of RBF neural networks .................................................... 28 Example: APPROXIMATION WITH RBF ......................................... 30

PATTERN RECOGNITION WITH NEURAL NETWORKS ................. 33

Data Preparation ..................................................................................... 33 Network Design ....................................................................................... 36 Network Training .................................................................................... 37 Network Testing ...................................................................................... 39 Drawing the Results ............................................................................... 41

Page 4: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

4

HOPFIELD NEURAL NETWORK WITH IMPLEMENTATION IN MATLAB AND C ............................................................................................. 42

The Hopfield Model ................................................................................ 42 Operation of the Hopfield Network ................................................. 43 Designing and training the Hopfield net. C ++ ............................ 44 Designing of a Hopfield network. Matlab ...................................... 46

COMPETITIVE NETWORKS - THE KOHONEN SELF-ORGANISING MAP .................................................................................................................... 47

Architecture of the Kohonen Network............................................ 47 The Kohonen Network in Operation ............................................... 48 Training the Kohonen Network ......................................................... 48 Example: DATA CLUSTERING ............................................................ 49

Page 5: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

5

DESIGNING AND TRAINING A PERCEPTRON

Perceptron supports a wide range of activation functions.

In order to solve a variety of problems as activation function it is convinient to choose a sign function:

A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs, then outputs 1 if the result is greater than some threshold and –1 otherwise.

More precisely, given inputs x1 through xn, the output o(x1, … xn) computed by the perceptron is

Page 6: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

6

where w – weight which determines the contribution of input xi to the perceptron output.

Learning a perceptron involves choosing values for the weights w0 ...wn. Therefore, the space of hypothesis in perceptron learning is the set of all possible real-valued weight vectors.

A single perceptron can be used to represent many Boolean functions. For example, if we assume Boolean values of 1 (true) and –1 (false), then one way to use a two-input perceptron to implement the AND function is to set the weights w0 = –0.8, and w1 = w2 = 0.5.

In fact, AND and OR can be viewed as special cases of m-of-n functions: that is, functions where at least m of the n inputs to the perceptron must be true. However, some Boolean functions cannot be represented by a single perceptron, such as the XOR function (Figure 1).

Figure 1 – The decision surface represented by a two-input

perceptron. x1 and x2 are the perceptron inputs. (a) A set of training examples and the decision surface of a perceptron that

classifies them correctly. (b) A set of training examples that is not linearly separable

THE PERCEPTRON TRAINING RULE

The precise learning problem is to determine a weight vector that causes the perceptron to produce the correct +1, –1 output for each of the given training examples.

Page 7: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

7

One way to learn an acceptable weight vector is: 1) to begin with random weights; 2) then iteratively apply the perceptron to each training

example; 3) modifying the perceptron weights whenever it

misclassifies an example; 4) this process is repeated until the perceptron classifies

all training examples correctly. Weights are modified at each step according to their

perceptron training rule, which revises the weight wi associated with input xi :

GRADIENT DESCENT AND THE DELTA RULE

Although the perceptron rule finds a successful weight vector when the training examples are linearly separable, it can fail to converge if the examples are not linearly separable.

Gradient descent searches the hypothesis space of possible weight vectors, even in nonlinear training examples, to find the weights that best fit the training examples.

Training error is the difference between target and output. Mathematically it is defined as follows:

where

D is the set of training examples; td is the target output for the training example d; od is the output of the linear unit for training example d. Gradient descent algorithms are algorithms that search

the steepest descent along the error space. It determines a

Page 8: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

8

weight vector that minimizes E by starting with an arbitrary initial weight vector, then repeatedly modifying it in small steps.

Linear units have a single global minimum in this error surface. Gradient descent algorithms continue searching process until the global minimum error is reached.

Each training example is a pair of the form (x, t), where x is the vector of input values, and t is the target output value. By Gradient rule we get these results:

From here, we can know that each unit weight w is redefined by the error value between target and output and also by learning rate.

CREATING AND TRAINING A PERCEPTRON (C++)

Let’s look at the logic table 1 for the x1 AND x2: Table 1 – Logic table

x1 x2 x1 AND x2

0 0 0

0 1 0

1 0 0

1 1 1

We can see that a neuron output is equal to 1 when both

inputs are activated. Let’s use a threshold of 0 (simple and convenient!) and set the inputs to –1 and 1.

The main function (C++) for training of the perceptron: void main()

Page 9: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

9

{ int x0=1,d=1,w0=0,w1=0,w2=0,i,j,yin,net; int x1[4], x2[4], t[4]; int x,y=4; char b[5]; i=0; printf("Enter The Truth Table For the AND Gate\n"); printf("x1 \t x2 \t t \n"); for(j=0;j<4;j++) scanf("%d%d%",&x1[j],&x2[j],&t[j]); for(j=0;j<4;j++) { if(x1[j]==0) x1[j]=–1; if(x2[j]==0) x2[j]=–1; if(t[j]==0) t[j]=–1; } yin= w0+((x1[0]*w1)+(x2[0]*w2)); yin=yin>0?1:(yin==0?0:–1); printf("\n yin=%d \n", yin); while((yin!=t[i])&&(i<4)) { printf(" \nt=%d",t[i]); w0=w0+d*t[i]*x0; printf("\nw0=w0+d*t*x0=%d",w0); w1=w1+d*t[i]*x1[i]; printf("\nw1=w1+d*t*x1=%d",w1); w2=w2+d*t[i]*x2[i]; printf("\nw2=w2+d*t*x2=%d",w2); printf("\nNew Matrix of weights is {%d %d

%d}",w0,w1,w2); i++; yin= w0+((x1[i]*w1)+(x2[i]*w2)); yin=yin>0?1:(yin==0?0:–1); printf("\n yin=%d",yin);

Page 10: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

10

} printf(" t=%d",t[i]); if(yin!=t[i]) printf("\nPerceptron can't be trainined"); else{ printf("\n x1 \t x2 \t NET"); for(i=0;i<=3;i++) { net= (x0*w0)+(x1[i]*w1)+(x2[i]*w2); net=net<0?0:1; if(x1[i]==–1) x1[i]=0; if(x2[i]==–1) x2[i]=0; printf("\n %d \t %d \t %d",x1[i],x2[i],net); } printf("\nPerceptron is trainied sucessfully"); } }

CREATING AND TRAINING PERCEPTRON WITH THE NNTOOL

We will now create and train a perceptron to recognize the following function:

Table 2 – Logic table

x1 x2 function (OR)

0 0 0

0 1 1

1 0 1

1 1 1

The match between input pattern and output pattern is

given by the following: Input pattern: [0 0 1 1; 0 1 0 1]; Matching output pattern: [0 1 1 1].

Page 11: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

11

Our first job is to get the information into MATLAB via the command window.

Type p = [0 0 1 1; 0 1 0 1] t = [0 1 1 1] in the command window. Open the Neural Network toolbox [matlab start >

toolboxes > neural network > NNTool] and click on NNTool. You can now see a GUI which will allow you to set up a network – in our case a perceptron. We want to import data from the workspace so click the import button in the Networks and data box. You are then given a choice of where to import from – we want the workspace. Select p and import these values as inputs. Now repeat to import t as targets. When done you return to the GUI and see p and t in the correct panes.

Click on “new network”. Leave the name as it is but choose perceptron from the drop down list as the network type and create a perceptron network. Get the input range from input p and leave the other values in the GUI as they are. Click to “create”. Back at the NNTool GUI select your network in the network pane and then click on the “adapt” button in the networks box. You need to set the inputs and outputs in the window (p, t respectively) and then set the adapt parameters (how many passes through the data – leave at 1 for now). Then clicking “adapt network” will make the network follow the adapt rule when altering weights and biases. Go back to the manager and view the output and error values to see if the training has worked. If the training has not worked try "adapt network" again. How many times do you have to run through the data to get the network to recognize the patterns? By doing 3 passes at a time you could have shortened the process – remember this is a controllable parameter when you construct a network.

Page 12: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

12

Now that we have a network that works we might want to keep it – we don’t want to create it every time. How can we do this? Export it to the workspace as a first step. Highlight the network in the network manager and then click the “export” button – then highlight again and click “export” again.

Now that we have the network in the workspace we can check that it works directly by using the sim function. The sim function evaluates the effect of a network on a set of input data – we will use it frequently when we have a trained network to calculate the network output with new data.

Type sim(network1,p) in the command window and check the result against t.

Now if we save the workspace we keep the network – do this.

If you click the workspace tab in the top left hand pane of the MATLAB window you can see all your variables. These can be inspected by double clicking on them. Click on each in turn to make sure that they contain what you expect.

We can access the internal pieces of this network:

w=network1.IW{1,1} b=network1.b{1} Now we can compute with the matrix values that are

there – but first notice how many rows there are in w and compare to the number of neurons in the perceptron layer. Compare the number of columns in w with the input size.

Check that hardlim(w*[1;2] +b) sim(network1,[1;2]) have the same output as each other.

Page 13: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

13

Let us get some more data into our workspace and create a new network to make sure we have the complete idea.

We will choose input vectors of size 3 p= [1 2 3 4 5; 2 3 4 5 6; –4 –4 –5 –10 7]

and outputs of size 2 (so we need a perceptron layer with two perceptrons when we create the network)

t= [0 1 1 0 1; 1 0 0 1 0] Delete all the old information in the network manager and

import the new p and t values. Create and train a neural network (network2) which learns this input-output pattern (remember to create a perceptron network and that you need 2 neurons)

View your network and look at the matrix values associated with it. Again check the relationship between the number of rows in w and the number of neurons in the network and the number of columns in w and the size of the input vector.

w=network1.IW{1,1} b=network1.b{1} Now compare hardlim(w*p(:,1) +b) with sim(network1,p(:,1)) Check that network1 computes the same value on p(:,x) as

hardlim(w*p(:,x) +b)) does for each column of p.

Page 14: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

14

USING MATLAB FOR CLASSIFICATION OF LINEARLY SEPARABLE DATA

CLASSIFICATION OF A 2-CLASS PROBLEM WITH A PERCEPTRON

Two clusters of data, belonging to two classes, are defined

in a 2-dimensional input space. Classes are linearly separable. The task is to construct a Perceptron for the classification of data.

Define input and output data % number of samples of each class N = 20; % define inputs and outputs offset = 3; % offset for second class x = [randn(2,N) randn(2,N)+offset]; % inputs y = [zeros(1,N) ones(1,N)]; % outputs % Plot input samples with PLOTPV %(Plot perceptron input/target vectors) figure(1) plotpv(x,y); Creation and training perceptron: net = newp(x,y); net = train(net,x,y); Plot decision boundary figure(1) plotpc(net.IW{1},net.b{1}); The result is on the Figure 2.

Page 15: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

15

CLASSIFICATION OF A 4-CLASS PROBLEM WITH A PERCEPTRON

Perceptron network with 2-inputs and 2-outputs is

trained to classify input vectors into 4 categories. Data defining: % number of samples of each class K = 30; % define classes off = .7; % offset of classes cl1 = [rand(1,K)–off; rand(1,K)+off]; cl2 = [rand(1,K)+off; rand(1,K)+off]; cl3 = [rand(1,K)+off; rand(1,K)–off]; cl4 = [rand(1,K)–off; rand(1,K)–off]; % plot classes plot(A(1,:),A(2,:),'bs') hold on plot(B(1,:),B(2,:),'g+') plot(C(1,:),C(2,:),'ro') plot(D(1,:),D(2,:),'m*') % text labels for classes text(.5–off,.5+2*off,'Class 1') text(.5+off,.5+2*off,'Class 2') text(.5+off,.5–2*off,'Class 3') text(.5–off,.5–2*off,'Class 4') % define output coding for classes class1 = [0 1]'; class2 = [1 1]'; class3 = [1 0]'; class4 = [0 0]';

Page 16: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

16

PREPARE INPUTS & OUTPUTS FOR PERCEPTRON TRAINING

% define inputs (combine samples from all four classes) P = [cl1 cl2 cl3 cl4]; % define targets T = [repmat(class1,1,length(cl1))

repmat(class2,1,length(cl2)) ... repmat(class3,1,length(cl3)) repmat(class4,1,length(cl4)) ];

CREATION AND TRAINING PERCEPTRON

net = newp(P,T); ADAPT returns a new network object that performs as a

better classifier, the network output, and the error. This loop allows the network to adapt for xx passes, plots the classification line, and continues until the error is zero.

E = 1; net.adaptParam.passes = 1; linehandle = plotpc(net.IW{1},net.b{1}); n = 0; while (sse(E) & n<900)

n = n+1; [net,Y,E] = adapt(net,P,T); linehandle = plotpc(net.IW{1},net.b{1},linehandle); drawnow;

end Perceptron simulation experiment: % For example, classify an input vector of [0.7; 1.2] p = [0.6; 1.1] y = sim(net,p) % compare response with output coding

Page 17: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

17

p = 0.7000 1.2000 y = 1 1

Figure 2 – Classification of a 2-class problem with a perceptron

Figure 3 – Classification of a 4-class problem with a perceptron

Page 18: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

18

APPROXIMATION OF FUNCTIONS BY NEURAL NETWORKS

The so called function approximation, is to find a mapping

f1 satisfying ||f1(x) – f(x)||<e, (e is the tolerance; ||·|| can be any error measurement). In general, it is enough to have a single layer of nonlinear neurons in a neural network in order to approximate a nonlinear function. The goal of this work is to build a feedforward neural network that approximates the following function:

2cos( 2 )Z x y xy .

DATA PREPARATION

For this function approximation problem, three kinds of data sets are prepared, namely the training set, the validation set and the test set. The training set is a set of value pairs which comprise information about the target function for training the network. The validation set is associated with the early stopping technique. During the training phase, the validation error is monitored in order to prevent the network from overfitting the training data. Normally, the test set is just used to evaluate the network performance afterwards. But, in this exercise the root mean-square error on the test set is used as the performance goal of the network training.

For the current problem, the training and the test data are taken from uniform grids (10 x 10 pairs of values for the training data, 9 x 9 pairs for the test data). So, it is not necessary to scale the target function. For the validation data, in order to make it a better representation of the original function, it is taken randomly from the function surface.

Page 19: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

19

The function for generating data: function [train_input,train_target,test_input,test_target,val_input,

val_target] = generate_data() train_x = –1:2/9:1; train_y = train_x; % training data test_x = (–1+1/9):2/9:(1–1/9); % test data [–1 1] test_y = test_x; % test data val_x = premnmx(rand(1,50)); % validation data [–1 1] val_y = val_x; % validation data [train_X, train_Y] = meshgrid(train_x, train_y); [test_X, test_Y] = meshgrid(test_x, test_y); [val_X, val_Y] = meshgrid(val_x, val_y); % functin output is within [–0.9 0.9],so no need to sacle % the function train_Z = cos(train_X + 2*train_Y) +train_X.*train_Y.^2; % training target test_Z = cos(test_X + 2*test_Y) + test_X.*test_Y.^2; % test target val_Z = cos(val_X + 2*val_Y) + val_X.*val_Y.^2; % validation target % plot the function [X,Y] = meshgrid(–1:.2:1,–1:.2:1); Z = cos(X + 2*Y) + X.*Y.^2; figure, subplot(1,2,1); surfc(X,Y,Z); % plot parametric surface % Return inputs [–1 1] and outputs[–0.8 0.8] train_input = [train_X(:)'; train_Y(:)']; train_target = train_Z(:)'; test_input = [test_X(:)'; test_Y(:)']; test_target = test_Z(:)'; val_input = [val_X(:)'; val_Y(:)']; val_target = val_Z(:)';

Page 20: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

20

NETWORK DESIGN

Theoretical results indicate that given enough hidden units, a feedforward neural network can approximate any non-linear functions (with a finite number of discontinuities) to a required degree of accuracy. In other words, any non-linear function can be expressed as a linear combination of non-linear basis functions. Therefore, a two-layer feedforward neural network with one layer of non-linear hidden neurons and one linear output neuron seems a reasonable design for a function approximation task. The target function as defined above has two inputs (x, y), and one output (z = f(x,y)). Thus, the network solution consists of two inputs, one layer of tansig (Tan-Sigmoid transfer function) neurons and one purelin (linear transfer function) output neuron.

The number of the hidden neurons is an important design issue. On the one hand, having more hidden neurons allows the network to approximate functions of greater complexity. But, as a result of network’s high degree of freedom, it may overfit the training data while the unseen data will be poorly fit to the desired function. On the other hand, although a small network won’t have enough power to overfit the training data, it may be too small to adequately represent the target function.

The function for creation of network function net = create_network() num_h = getInput('Size of the hidden layer[8] –> ',8); transFcn_h = getInput('Transfer function of the hidden layer[tansig]–> ','tansig','s'); transFcn_o = getInput('Transfer function of the output layer[purelin]–> ','purelin','s'); % create the network based on the user's choice net=newff([–1 1; –1,1],[num_h 1],

{transFcn_h,transFcn_o});

Page 21: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

21

NETWORK TRAINING

In general, we can train a network in two kinds of styles: batch training or incremental training. In batch training, weights and biases of the network are only updated after all of the inputs are presented to the network, while in incremental (on-line) training the network parameters are updated each time an input is presented to it. The batch training is supposed to work faster and reasonably well on a static network.

There is a number of batch training algorithms which can be used to train a network. In this exercise, the following four training algorithms are examined.

trainbfg implements BFGS (Shanno) quasi-Newton algorithm, which is based on the Newton’s method. Generally, it converges in a few iterations. However for very large networks trainbfg may not be a good choice because of its computation and memory overhead. For small networks, however, trainbfg can still be an efficient training function.

traingd implements a basic gradient descent algorithm. It updates weights and biases in the direction of the negative gradient of the performance function. The mayor drawback of traingd is that it is relatively slow (especially when the learning rate is small) and has a tendency to get trapped in local minima of the error surface (where the gradient is zero.).

traingdm improves traingd by using momentum during the training. Momentum allows a network to ignore the shallow local minimum of the error surface. In addition, traingdm often provides a faster convergence than traingd.

trainlm implements the Levenberg-Marquardt algorithm, which works in such a way that performance function will always be reduced at each iteration of the algorithm. This feature makes trainlm the fastest training algorithm for networks of moderate size. Similar to trainbfg, trainlm suffers from the

Page 22: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

22

memory and computation overhead caused by the calculation of the approximated Hessian matrix and the gradient.

In order to examine the performance of the training functions mentioned above, they are applied to the two-layer feedforward network respectively with the performance goal (MSE = 0.02 for the training set), maximum number of epochs to train (100) and the learning rate (0.02) being the same (without using early stopping). Within 100 epochs, trainbfg and trainlm achieve the performance goal while traingd and traindm fail. As it turned out the Trainbfg and trainlm spend more time in each epoch than the gradient descent algorithms, which is the result of their computation overhead. Although more time is spent in each epoch, the total time spent by trainbfg and trainlm to reach the goal is less.

If the size of the network is too large it may run a risk of overfitting the training set and loses its generalization ability for unseen data. One method for improving network generalization ability is to use a network that is just large enough to provide an adequate fit to the target function. But sometimes it is hard to know beforehand how large a network should be for a specific application. One commonly used technique for improving network generalization is early stopping. This technique monitors the error on a subset of the data (validation data) that does not actually take part in the training. The training stops when the error on the validation data increases for a certain amount of iterations.

In order to examine the effect of early stopping on the training process, a randomly generated validation set is used during the trainlm training (maximum validation failures = 10, Erms = 0.02 for the test set). As it turned out the early stopping mechanism is not triggered during the training. That is because the validation error keeps decreasing during the whole training process. Both networks (trained with and without early

Page 23: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

23

stopping) work equally well on the current approximation problem.

Normally, the test set doesn’t take part in the training process. However, in this exercise, it is required that the network should be trained until Erms = 0.02 for the test set. The training process then goes as followings. Initially, the performance goal for the training set is set to be a relatively large value (MSE = 0.02). Then, after each training process, the network is simulated and Erms on the test set is monitored. If Erms is larger than 0.02, the training is resumed for a lower performance goal for the training set (e. g. decreases by a factor of 0.5). Otherwise, the training stops. In current Matlab program, the performance of the trained network is evaluated by using the test set. Actually, it may introduce some bias on the result, because the test set is virtually used in the training phase. So, it would be better, if some other randomly generated data can be used for testing the network performance.

The function for training the network

function [error,network_output] = train_network( net,train_input,train_target,test_input,test_target,val_input,val_target)

val.P = val_input; val.T = val_target; test.P = test_input; test.T = test_target; % ask the user for the training parameters epoch = round( getInput('Maximum number of epochs to

train [5000]: ', 5000)); % maximum number of epochs to train Lr = getInput('Learning rate [.02]: ', .02); % learning rate trainFcn= getInput('Training function [trainlm]–>

','trainlm','s'); % training function (Automated Regularization (trainbr)) net.trainFcn = trainFcn;

Page 24: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

24

net.trainParam.lr = Lr; net.trainParam.epochs = epoch; net.trainParam.show = 40; % Epochs between displays net.trainParam.goal = 0.02; % Mean-squared error goal stop_crit = getInput('Use early stopping ? y/n [n]:', 'n', 's'); erms = 1; % Training... if(stop_crit=='n')% no stop criteria tic, % start a stopwatch timer. while erms > 0.02 net = train(net,train_input,train_target,[],[],[],test); network_output = sim(net,test_input); error = test_target – network_output; erms = sqrt(mse(error)) % root mean-square error net.trainParam.goal = net.trainParam.goal*0.5; end toc; % prints the elapsed time since tic was used else % use early stopping tic, net.trainParam.max_fail = getInput('Maximum validation failures [10]:', 10); while erms > 0.02 net = train(net,train_input,train_target,[],[],val,test); network_output = sim(net,test_input); error = test_target – network_output; erms = sqrt(mse(error)) % root mean-square error net.trainParam.goal = net.trainParam.goal*0.5; end toc; end

Page 25: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

25

NETWORK TESTING

After the training process, the performance of the trained network will be evaluated by applying unseen data to it and checking whether its outputs are still relevant to the targets. We can use Matlab routine postreg to measure the network performance, which implements a regression analysis between the network response and the corresponding targets.

The function to create displays of function surface and level curves function plot_result (net,input,target,network_output,error)

X = reshape(input(1,:),9,9); Y = reshape(input(2,:),9,9); Z = reshape(target,9,9); No = reshape(network_output,9,9); E = reshape(error,9,9); % plot function surface figure, subplot(1,2,1); surfc(X,Y,Z); xlabel('X'); ylabel('Y'); zlabel('Z'); title('Target Function Surface'); subplot(1,2,2); surfc(X,Y,No); title('Approximated Function Surface'); % plof level curves... % create level curves of error figure, [C,h] = contour(X, Y, E); clabel(C,h); title('level courve of the error') figure,

Page 26: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

26

[C,h1] = contour(X, Y, Z,'k'); % create level curve of target set(h1,'LineWidth',2); % clabel(C,h); hold on [C,h2] = contour(X, Y, No,'m'); % create level curve of approximation set(h2,'LineWidth',2); hold off legend([h1(1);h2(1)],'target','approximation'); title('level courves of the target and approximation

functions') % M – Slope of the best linear regression.M=1 means perfect

fit. % B – Y intercept of the best linear regression.B=0 means

perfect fit. % R – Regression R-value. R=1 means perfect correlation. figure, %create a new figure for displaying the performance [M,B,R] = postreg(network_output,target); % check the quality of the network training fprintf('\n\tThe slope of the best linear regression[1]: %6.5f\n',M); fprintf('\tThe Y intercept of the best linear regression[0]: %6.5f\n',B); fprintf('\tThe coorelation between the network output and the target[1]: %6.5f\n',R); [train_input,train_target,test_input,test_target,val_input,

val_target] =generate_data; net = create_network; [error,network_output]=train_network( net,train_input,train_target,test_input,test_target,

val_input,val_target); plot_result(test_input,test_target,network_output,error);

Page 27: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

27

CONCLUSION

A two-layer network with two inputs, eight tansig hidden units and one purelin output unit is built for the approximation problem mentioned above. The network is trained by trainlm until the performance goal Erms = 0.02 is achieved for the test set. No early stopping is used during the training. The maximum number of epochs to train and the learning rate are set to be 5000 and 0.02 respectively.

Page 28: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

28

FUNCTION APPROXIMATION WITH RBFN

STRUCTURE OF RBF NEURAL NETWORKS

In multi-layer perceptrons, the hidden neurons are based on linear basis function (LBF) nodes. Another type of hidden neurons is the radial basis function (RBF) neurons, which is the building block of the RBF neural networks. In an RBF network, each neuron in the hidden layer is composed of a radial basis function that also serves as an activation function. The weighting parameters in an RBF network are the centres and the widths of these neurons. The output functions are the linear combination of these radial basis functions.

Figure 4 –Structure of RBF neural networks

)(tx

Group 1

Group K

…..

))((1 tx ))((2 tx ))((1 tM x))(( tM x

(D-dimensional vector)

1 1

10 0K

1112

M1 1K

2KKM

…….

))((1 ty x ))((2 ty x

Page 29: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

29

A more general form of the RBF networks is the elliptical basis function (EBF) networks where the hidden neurons compute the Mahalanobis distance between the centers and the input vectors. It has been shown that RBF networks have the same asymptotic approximation power as multi-layer perceptrons.

To apply RBF/EBF networks for pattern classification, each class is assigned a group of hidden units, and each group is trained independently using the data from the corresponding class. Figure 4 depicts the architecture of an RBF/EBF network with D inputs, M basis functions (hidden nodes), and K outputs. The input layer distributes the D-dimensional input patterns, xt, to the hidden layer. Each hidden unit is a Gaussian basis function of the form

11( ) exp ( ) ( ) , ,...,

2

T

j t t j j t j

j

j 1 M

x x x ,

where j and j are the mean vector and covariance matrix of

the j-th basis function respectively, and j is a smoothing

parameter controlling the spread of the j-th basis function. The k-th output is a linear weighted sum of the basis functions’ output, i.e.

01

( ) ( )M

k t k kj j tj

y

x x , 1, ,t N and 1, ,k K ,

where tx is the tth input vector and 0k is a bias term.

In matrix form, last equation can be written as Υ ΦWwhere Y is an N K matrix, Φ an ( 1)N M matrix,

and W is an ( 1)M K matrix. The weight matrix W is the least

Page 30: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

30

squares solution of the matrix equation

ΦW D,

where D is an N K target matrix containing the desired output

vectors in the rows. As Φ is not a square matrix, one reliable way to solve the last equation is to use the technique of singular value decomposition. In this approach, the matrix Φ is decomposed into the product U VT, where U is an ( 1)N M

column-orthogonal matrix, is an ( 1) ( 1)M M diagonal

matrix containing the singular values, and V is an

( 1) ( 1)M M orthogonal matrix. The weight vectors 1

K

k kw

are given by

1V UT

k kd w ,

where kd is the kth column of D. For an over-determined system,

singular value decomposition gives a solution that is the best approximation in the least squares sense.

EXAMPLE: APPROXIMATION WITH RBF

Create a function approximation model based on a measured data set. Apply various Neural Network architectures based on Radial Basis Functions.

% data generator X = 0:.1:40; %f = abs(besselj(2,X*7).*asind(X/2) + (X.^1.95)) + 2; f = sin(X)/10+2*exp(–X); fig = figure; plot(X,f,'b–')

Page 31: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

31

hold on grid on % available data points Ytrain = f + (rand(1,length(f))–.5)/10; Xtrain = X([20:100 200:300]); Ytrain = Ytrain([20:100 200:300]); plot(Xtrain,Ytrain,'kx') xlabel('x') ylabel('y') ylim([–0.2 1]) %--------------------------------- % choose a spread constant spread = .2;

Figure 5 – The results of computer simulations

Page 32: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

32

% choose max number of neurons K = 50; % performance goal (SSE) goal = 0; % number of neurons to add between displays Ki = 5; % create a neural network net = newrb(Xtrain,Ytrain,goal,spread,K,Ki); %--------------------------------- % simulate a network over complete input range Y = sim(net,X); % plot network response figure(fig) plot(X,Y,'r') legend('original function','available data','RBFN')

Page 33: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

33

PATTERN RECOGNITION WITH NEURAL NETWORKS

The task is to build a multilayer feedforward network for

pattern recognition. The network is trained as a character classifier for a collection of characters given as 7 x 5 black- white pixel maps. Ideally, the trained network can recognize characters it has learnt even when some of them are distorted.

DATA PREPARATION

In general, there are two kinds of data prepared for training and testing the network. One is the collection of thirty-one 35-element input vectors, which represent the target patterns: 26 capital characters. Another part of the data is collected by randomly reversing three bits of original characters. This time, instead of using early stopping to improve the generalization ability of the network, the network is trained on both parts of the data mentioned above, which enables its response correctly to both ideal and partially corrupted patterns.

The function for generation chars:

function [alphabet,targets] = generate_chars() % GENERATE_CHARS – Create target patterns % % Returns: % alphabet - 35x31 matrix of 5x7 bit maps for each letter. % targets – 31x31 target vectors. [alphabet,targets] = prprob; % capital characters targets = eye(26); % show the image of alphabet figure;

Page 34: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

34

for i=1:size(alphabet,2) subplot(4,8,i); imagesc(reshape(alphabet(:,i),5,7)',[0,1]); axis off; end

The target patterns:

The function for generation of distorted chars

function noisy_alphabet = generate_charsn(alphabet,noise_level) % GENERATE_CHARSN – Create distorted patterns % % Arguments: % alphabet – 35x31 matrix of 5x7 bit maps for each letter. % noise_level – Number of bits which will be changed . % Returns: % noisy_alphabet – Alphabet with noise % add noise to the original alphabet noisy_alphabet = alphabet; if noise_level~=0 size_image = length(alphabet(:,1));

Page 35: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

35

% choose noise_level amount of random positions for each % letter matrix for i=1:size(alphabet,2) R(i,:) = round(rand(1,noise_level)*(size_image–1)+1)+(i– 1)*(size_image); while length(unique(R(1,:)))< noise_level % prevent same random %numbers to be generated R(i,:) = round(rand(1,noise_level)*(size_image–1)+1)+

(i–1)*(size_image); end end % randomly change noise_level number of bits in each letter % image : %0–>1 and 1–>0 noisy_alphabet(R) = imcomplement(alphabet(R)); end

The distorted patterns:

Page 36: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

36

NETWORK DESIGN

In principle, two-layer networks with sigmoidal hidden units can approximate arbitrarily well any functional continuous mapping from one finite-dimensional space to another, provided the number of hidden units is sufficiently large. As the target patterns are relatively simple, which are defined by only 35 Boolean values. Therefore, a two-layer feedforward network is supposed to be power enough for this character recognition task. As 26 target characters are represented by 35-element input vectors, the neural network needs 35 input and 26 output neurons. The network receives 35 Boolean values, that represent one character. It is then required to identify the character by giving an output vector, the element of which with the highest value indicates the class of input character. The logsig (Log Sigmoid) is chosen as the transfer function for both hidden and output layers. This is because it has a suitable output range ([0 1]) for the current problem. The number of hidden neurons is set to be 15.

Another important design issue is the choice of the initial weights and bias. In general, weights and bias should be initialized to small values so that the active region of each neuron is not close to the irresponsive (saturate) part of the transfer function; otherwise the network won’t be able to learn. When using the Matlab routine newff to create a network, each layer's weights and biases are initialized automatically. In the program, the automatically created layer weights from the hidden layer to the output layer and the bias of the output layer are scaled down by a factor of 0.01.

The function for creation a feed-forward backpropagation network with one hidden layer:

function net = create_network(input,target) % Arguments:

Page 37: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

37

% input – Network inputs. % target – Target value. % % Returns: % net – Network object created % [S2,Q] = size(target); % ask the user for the network parameters % create the network based on the user input net = newff(minmax(input),[15 26],{'logsig' 'logsig'}); % scale down weights and bias net.LW{2,1} = net.LW{2,1}*0.01; net.b{2} = net.b{2}*0.01;

NETWORK TRAINING

After the network is created, it is then ready for training. A gradient decent training function with momentum and adaptive learning rate (traingdx) is chosen to train the network. For the pattern recognition task, it is important that the noisy patterns can still be correctly classified. Thus, in order to make the network insensitive to the presence of noise, it is trained on not only ideal patterns but also noisy patterns. In the program, a three-step training process is implemented. In the first step, the network is trained on the ideal data for zero decision errors. In the second step, the network is trained on noisy data for several passes for a proper performance goal (0.01 is used in the program). Unfortunately, after the network is trained for recognizing noisy patterns, it will probably “forget” those noise-free patterns it has learnt before. Therefore, in order to recall the network of these non-distorted characters, in the final step, it is trained again on just ideal data for zero decision errors. The three-step training process mentioned above enables the trained

Page 38: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

38

network to identify both noise-free and noisy characters (within certain error tolerance).

function [net,netn] = train_network( net,input,target)

% Arguments: % net – Neural network. % input – Input matrix. % target – Desired output matrix. % Returns: % net – New network trained by input % netn – New network trained by noisy_input net.trainFcn = 'traingdx'; net.trainParam.epochs = 5000; net.trainParam.show = 40; % Epochs between displays net.trainParam.goal = 0; % Mean-squared error goal net.trainParam.mc = 0.95; % Momentum constant. % Training... % 1: train a network without noise [net,tr] = train(net,input,target); fprintf ('Strike any key to train the network with noise...\n'); pause % A copy of the network will now be made. This copy will % be trained with noisy examples of letters of the alphabet. netn = net; % 2: train another network with noise % netn will be trained on all sets of noisy letters netn.trainParam.goal = 0.01; for pass = 1:20 % create noisy input by distorting 3 bits % of every original character matrix noisy_input = generate_charsn(input,3); [netn,tr] = train(netn,noisy_input,target); end % netn is now retrained without noise to

Page 39: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

39

% insure that it correctly categorizes non-noisy letters. netn.trainParam.goal = 0; [netn,tr] = train(netn,input,target);

NETWORK TESTING

Once the network is trained, the test data which consist of

both noise-free and slightly distorted patterns are fed to the network to check the training result. Here, the average recognition error rate is used as the performance measure.

function [error,errorn,noise_range,noisy_input,outputn] = test_network(net,netn,alphabet,targets)

% TEST_NETWORK – Evaluate the performance of the % trained network by % average errors. % % Arguments: % alphabet – 35x31 matrix of 5x7 bit maps for each letter. % targets – Target value % Returns: % error – Average error of the network trained without noise % errorn – Average error of the network trained with noise % noise_range – Noise levels % noisy_input – Distorted patterns with the highest noise % level % outputn – Output given by netn % % SET TESTING PARAMETERS noise_range = 0:3; max_test = 100;

Page 40: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

40

error = []; errorn = []; T = targets; % PERFORM THE TEST for noise_level = noise_range fprintf('Testing networks with %d bits of noise\n',noise_level); e = 0; en = 0; for i=1:max_test

P = generate_charsn(alphabet,noise_level); noisy_input = P; % TEST NETWORK WITHOUT NOISE A = sim(net,P); AA = compet(A); e = e + sum(sum(abs(AA–T)))/2; % TEST NETWORK WITH NOISE An = sim(netn,P); AAn = compet(An); en = en + sum(sum(abs(AAn–T)))/2;

end % AVERAGE ERRORS FOR max_test SETS OF ALL TARGET %VECTORS. error = [error e/size(T,2)/max_test] errorn = [errorn en/size(T,2)/max_test] end % output of netn when input patterns are distorted with the % highest noise_level result = full(AAn); outputn = alphabet; for i = 1:size(result,2) index = find(result(:,i)); outputn(:,i) = alphabet(:,index); end

Page 41: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

41

DRAWING THE RESULTS

function plot_result(error,errorn,noise_range,noisy_input, outputn )

% Arguments: % error – Average error of the network trained without noise % errorn – Average error of the network trained with noise % noise_range – Noise levels % noisy_input – Distorded patterns with the highest noise % level % outputn – Output given by netn % Here is a plot showing the percentage of errors for % the two networks for varying levels of noise. figure, plot(noise_range,error*100,'––k',noise_range,errorn*100,'r',

'LineWidth',2); xlabel('Noise Level'); ylabel('Percentage of Recognition Errors'); legend('trained without noise','trained with noise'); % give a plot of noisy inputs and outputs % given by the network trained on noisy data figure, for i=1:size(noisy_input,2) subplot(4,8,i); colormap('summer'); imagesc(reshape(noisy_input(:,i),5,7)',[0,1]); axis off; end figure for i=1:size(outputn,2)

subplot(4,8,i); colormap('summer'); imagesc(reshape(outputn(:,i),5,7)',[0,1]); axis off; end

Page 42: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

42

HOPFIELD NEURAL NETWORK WITH IMPLEMENTATION IN MATLAB AND C

THE HOPFIELD MODEL

Figure 6 – Structure of Hopfield neural network

A fully connected network with binary (0/1, or +1/–1) inputs and outputs.

Symmetrically weighted (wij = wji).

Nodes perform weighted sum with a hard limiting (step) transfer function.

Output of each node fed back to the others.

Input applied to all nodes simultaneously and the network left to stabilize.

Page 43: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

43

Outputs from the nodes in the stable state form the output of the network.

When presented with an input pattern, it outputs a stored pattern nearest to the presented pattern.

Good as content addressable memory and for solving optimization problems.

OPERATION OF THE HOPFIELD NETWORK

The Hopfield network has no learning algorithm as such. Patterns (or facts) are simply stored by setting weights to lower the network energy.

The teaching stage: The connection weights are set using the exemplar

patterns from all classes according to the equation

1

0

, ,

0, , 0 , 1

M

is jssij

x x i jw

i j i j N

where wij is the connection weight between node i and node j, xis (either +1 or –1) is element i of the exemplar pattern for class s , and M is the number of pattern classes.

The result of the teaching stage is the association of a pattern with itself.

The recognition stage: The output of the net is forced to match that of the

imposed unknown pattern.

i(0) = xi, 0 i N – 1,

where i(t) is the output of node i at time t.

Page 44: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

44

The net is then allowed to iterate freely in discrete time steps until it converges (the output no longer changes).

1

0

( 1) ( ) ,N

i ij jj

t f w t i j

.

The transfer function f is the step function.

The autoassociation of patterns means that presentation of a corrupt or incomplete input pattern will result in the reproduction of the original pattern as output. The network thus works as a content addressable memory (CAM).

DESIGNING AND TRAINING THE HOPFIELD NET. IMPLEMENTATION IN C

The above algorithm for designing and training the HOPFIELD NET is used in the program:

void main() { int a[5][5], at[5][5],w[5][5],n,i,j,k,x[5][5],y[5][5],yin; clrscr(); n=5; // the amount of numbers in the Stored vector; printf("Enter the Stored vector :"); for(i=0;i<n;i++) scanf("%d",&a[0][i]); for(i=0;i<n;i++) { if(a[0][i]==0) a[0][i]=–1; } for(i=0;i<n;i++) at[i][0]=a[0][i];

Page 45: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

45

for(i=0;i<n;i++) { for(j=0;j<n;j++) { w[i][j]=0; w[i][j]+=at[i][0]*a[0][j]; } } for(i=0;i<n;i++) w[i][i]=0; // initializing the w[i][i]=0 printf("The weight Matric is "); for(i=0;i<n;i++) { printf("\n"); for(j=0;j<n;j++) printf("\t%d",w[i][j]); } printf("\nEnter the New vector :"); for(i=0;i<n;i++) { scanf("%d",&x[0][i]); y[0][i]=x[0][i]; } for(i=0;i<n;i++) { for(j=0;j<n;j++) yin+=(y[0][j]*w[j][i]); yin+=x[0][i]; yin=yin<=0?0:1; y[0][i]=yin; } printf("The value of the new vector should be : "); for(i=0;i<n;i++) printf(" %d",y[0][i]);}

Page 46: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

46

DESIGNING OF A HOPFIELD NETWORK. IMPLEMENTATION IN MATLAB

A Hopfield network can be created in Matlab by using the function newhop(data). The network functioning is simulated using the function sim. There are two variants of calling the function sim:

result= sim(net,M,[],test) or result= sim(net,{M,iterations}, {},test) where M is the number of test data to be taken from the

test matrix (specified as the last parameter). In the first variant the user does not control the number of iterations while in the second case he can do this.

The program to design a Hopfield network which stores 4

vectors is:

vectors=[–1 1 –1 –1 1 –1 –1 1 –1; –1 –1 –1 1 1 1 –1 –1 –1; –1 –1 1 –1 1 –1 1 –1 –1; 1 –1 –1 –1 1 –1 –1 –1 1]';

net=newhop(vectors); result=sim(net,4,[],vectors); disp('Stored vectors:'); disp(vectors); disp('Fixed points:'); disp(result); % Dest data test={[0.1; 0.8; –1; –0.7; 0.5; –1; –0.9; 0.85; –1]}; result=sim(net,{1,5},{},test); % Network state after each iteration for i=1:5, disp(sprintf('Network state after %d iterations:',i)); disp(result{i}); end

Page 47: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

47

COMPETITIVE NETWORKS – THE KOHONEN SELF-ORGANISING MAP

ARCHITECTURE OF THE KOHONEN NETWORK

The Kohonen network consists of an input layer, which distributes the inputs to each node in a second layer, the so-called competitive layer. Each of the nodes on this layer acts as an output node. Each neuron in the competitive layer is connected to other neurons in its neighbourhood and feedback is restricted to neighbours through these lateral connections. Neurons in the competitive layer have excitatory connections to immediate neighbours and inhibitory connections to more distant neurons.

Figure 7 – Structure of Kohonen network

Page 48: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

48

All neurons in the competitive layer receive a mixture of excitatory and inhibitory signals from the input layer neurons and from other competitive layer neurons.

THE KOHONEN NETWORK IN OPERATION

As an input pattern is presented, some of the neurons are sufficiently activated to produce outputs which are fed back to other neurons in their neighbourhoods. The node with the weight vector closest to the input pattern vector (the so-called “winning node”) produces the largest output. During training, input weights of the winning neuron and its neighbours are adjusted to make them resemble the input pattern even more closely. At the completion of training, the winning node ends up with its weight vector aligned with the input pattern and produces the strongest output whenever that particular pattern is presented. The nodes in the winning node’s neighbourhood also have their weights modified to settle down to an average representation of that pattern class. As a result, unseen patterns belonging to that class are also classified correctly (generalization). The m neighbourhoods, corresponding to the m possible pattern classes are said to form a topological map representing the patterns.

The initial size of the neighbourhood mentioned above and the fixed values of excitatory (positive) and inhibitory (negative) weights to neurons in the neighbourhood are among the design decisions to be made.

TRAINING THE KOHONEN NETWORK

1. Initialise weights Initialise weights from N inputs to the nodes to small

random values. Set the initial radius of the neighbourhood.

Page 49: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

49

2. Present new input x0(t), x1(t), x2(t) ... xn–1(t), where xi(t) is the input to node i at time t.

3. Compute distances to all nodes. Compute distances dj between the input and each output

node j using

dj = (xi(t) – wij(t))2,

where xi(t) is the input to node i at time t and wij(t) is the weight from input node i to output node j at time t.

4. Select output node with minimum distance. Select output node j* as the output node with minimum dj. 5. Update weights to node j* and neighbours. Weights updated for node j* and all nodes in the

neighbourhood defined by Nj*. New weights are

wij(t)(t + 1) = wij(t) + (t)(xi(t) – wij(t)),

for j in Nj*, 0 i N–1.

The term (t) is a gain term 0 1. Both and Nj* decrease with time.

6. Repeat by going to step 2.

EXAMPLE: DATA CLUSTERING

clear all num=300; % amount of points to categorization num_n=3 % num_n * num_n –– amount of neurons (clusters) p = –5:5; for i=1:11:num kx=rand(1,11)*0.2–0.1; ky=rand(1,11)*0.2–0.1; x(i+5+p)=rand+kx;

Page 50: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

50

y(i+5+p)=rand+ky; end; for j1=1:num_n for j2=1:num_n w1(j1,j2)=rand*0.05+0.48; w2(j1,j2)=rand*0.05+0.48; end end figure(1) axis([0 1 0 1]) plot(x,y,'.r') hold on plot(w1,w2,'ob') plot(w1,w2,'k','linewidth',2) plot(w1',w2','k','linewidth',2) hold off drawnow no=1; do=5; T=300; t=1; while (t<=T) n=no*(1–t/T); d=round(do*(1–t/T)); %loop for the 1000 inputs for i=1:num e_norm=(x(i)–w1).^2+(y(i)–w2).^2; minj1=1; minj2=1; min_norm=e_norm(minj1,minj2); for j1=1:num_n for j2=1:num_n if e_norm(j1,j2)<min_norm min_norm=e_norm(j1,j2); minj1=j1;

Page 51: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

51

minj2=j2; end end end j1star= minj1; j2star= minj2; %update the winning neuron w1(j1star,j2star)=w1(j1star,j2star)+n*(x(i)–

w1(j1star,j2star)); w2(j1star,j2star)=w2(j1star,j2star)+n*(y(i)– w2(j1star,j2star)); %update the neighbour neurons for dd=1:1:d jj1=j1star–dd; jj2=j2star; if (jj1>=1) w1(jj1,jj2)=w1(jj1,jj2)+n*(x(i)–w1(jj1,jj2)); w2(jj1,jj2)=w2(jj1,jj2)+n*(y(i)–w2(jj1,jj2)); end jj1=j1star+dd; jj2=j2star; if (jj1<=num_n) w1(jj1,jj2)=w1(jj1,jj2)+n*(x(i)–w1(jj1,jj2)); w2(jj1,jj2)=w2(jj1,jj2)+n*(y(i)–w2(jj1,jj2)); end jj1=j1star; jj2=j2star–dd; if (jj2>=1) w1(jj1,jj2)=w1(jj1,jj2)+n*(x(i)–w1(jj1,jj2)); w2(jj1,jj2)=w2(jj1,jj2)+n*(y(i)–w2(jj1,jj2)); end jj1=j1star; jj2=j2star+dd; if (jj2<=num_n) w1(jj1,jj2)=w1(jj1,jj2)+n*(x(i)–w1(jj1,jj2));

Page 52: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

52

w2(jj1,jj2)=w2(jj1,jj2)+n*(y(i)–w2(jj1,jj2)); end end end t=t+1; figure(1) plot(x,y,'.r') hold on plot(w1,w2,'ob') plot(w1,w2,'k','linewidth',2) plot(w1',w2','k','linewidth',2) hold off title(['t=' num2str(t)]); drawnow end

Figure 8 – The results of clustering

Page 53: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

53

REFERENCES

1. Haykin Simon. Neural Networks: A Comprehensive Foundation / Simon S. Haykin. – USA : Macmillan, 1994. – 696 p.

2. Fausett Laurene. Fundamentals of Neural Networks: Architectures, Algorithms, and Applications / Laurene Fausett. – USA : Prentice-Hall, 1994. – 461 p.

3. Hassoun Mohamad. Fundamentals of Artificial Neural Networks / Mohamad H. Hassoun. – USA : MIT Press, 1995. – 511 p.

4. McMahon David. MATLAB Demystified / David McMahon. – USA : The McGraw-Hill Companies, 2007. – 326 p.

5. Hunt Brian. A Guide to MATLAB for Beginners and Experienced Users / Brian R. Hunt, Ronald L. Lipsman, Jonathan M. Rosenberg. – USA : Cambridge University Press, New York, 2001. – 327 p.

6. Jin Yu. Artificial Neural Network / Yu Jin [Електронний ресурс]. – Режим доступу : https://users.cecs.anu.edu.au/~jinyu/.

7. Primoz Potocnik. Neural Networks course / Potocnik Primoz [Електронний ресурс]. – Режим доступу : https://www.neural.si.

Page 54: Ministry of Education and Science of Ukraine Sumy State ...lib.sumdu.edu.ua/library/docs/rio/2017/m4324.pdf · Methodological instructions for practical training in “Modelling of

Навчальне видання

МЕТОДИЧНІ ВКАЗІВКИ

до практичних робіт

із дисципліни «Моделювання нейронних мереж»

для студентів спеціальності “Прикладна математика”

(Англійською мовою)

Відповідальний за випуск О. В. Лисенко

Редактор Л. В. Штихно

Комп’ютерне верстання І. О. Князя

Формат 60х84/16. Ум. друк. арк. 3,26. Обл.-вид. арк. 3,38.

Видавець і виготовлювач

Сумський державний університет,

вул. Римського-Корсакова, 2, м. Суми, 40007

Свідоцтво суб’єкта видавничої справи ДК № 3062 від 17.12.2007.