ministry of education and science of ukraine sumy state...
TRANSCRIPT
Ministry of Education and Science of Ukraine
Sumy State University
4324 METHODOLOGICAL INSTRUCTIONS
for practical training
in “Modelling of Neural Networks”
for students of the speciality
8.04030101 “Applied Mathematics’’
Qualification Master level
Full-time training
Sumy
Sumy State University
2017
Methodological instructions for practical training in
“Modelling of Neural Networks” / compiler I. A. Knyaz’ – Sumy:
Sumy State University, 2017. – 54 p.
Department of Applied Mathematics and Complex Systems
Modelling
3
CONTENTS
P.
DESIGNING AND TRAINING A PERCEPTRON ....................................... 5
The Perceptron Training Rule ............................................................... 6 Gradient descent and the delta rule .................................................... 7 Creating and training a perceptron (C++) ........................................ 8 Creating and training perceptron with the NNTool .................. 10
USING MATLAB FOR CLASSIFICATION OF LINEARLY SEPARABLE DATA ........................................................................................ 14
Classification of a 2-class problem with a perceptron ............. 14 Classification of a 4-class problem with a perceptron ............. 15 Prepare inputs & outputs for perceptron training ..................... 16 Creation and training perceptron ..................................................... 16
APPROXIMATION OF FUNCTIONS BY NEURAL NETWORKS ...... 18
Data Preparation ..................................................................................... 18 Network Design ....................................................................................... 20 Network Training .................................................................................... 21 Network Testing ...................................................................................... 25 Conclusion .................................................................................................. 27
FUNCTION APPROXIMATION WITH RBFN ........................................ 28
Structure of RBF neural networks .................................................... 28 Example: APPROXIMATION WITH RBF ......................................... 30
PATTERN RECOGNITION WITH NEURAL NETWORKS ................. 33
Data Preparation ..................................................................................... 33 Network Design ....................................................................................... 36 Network Training .................................................................................... 37 Network Testing ...................................................................................... 39 Drawing the Results ............................................................................... 41
4
HOPFIELD NEURAL NETWORK WITH IMPLEMENTATION IN MATLAB AND C ............................................................................................. 42
The Hopfield Model ................................................................................ 42 Operation of the Hopfield Network ................................................. 43 Designing and training the Hopfield net. C ++ ............................ 44 Designing of a Hopfield network. Matlab ...................................... 46
COMPETITIVE NETWORKS - THE KOHONEN SELF-ORGANISING MAP .................................................................................................................... 47
Architecture of the Kohonen Network............................................ 47 The Kohonen Network in Operation ............................................... 48 Training the Kohonen Network ......................................................... 48 Example: DATA CLUSTERING ............................................................ 49
5
DESIGNING AND TRAINING A PERCEPTRON
Perceptron supports a wide range of activation functions.
In order to solve a variety of problems as activation function it is convinient to choose a sign function:
A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs, then outputs 1 if the result is greater than some threshold and –1 otherwise.
More precisely, given inputs x1 through xn, the output o(x1, … xn) computed by the perceptron is
’
6
where w – weight which determines the contribution of input xi to the perceptron output.
Learning a perceptron involves choosing values for the weights w0 ...wn. Therefore, the space of hypothesis in perceptron learning is the set of all possible real-valued weight vectors.
A single perceptron can be used to represent many Boolean functions. For example, if we assume Boolean values of 1 (true) and –1 (false), then one way to use a two-input perceptron to implement the AND function is to set the weights w0 = –0.8, and w1 = w2 = 0.5.
In fact, AND and OR can be viewed as special cases of m-of-n functions: that is, functions where at least m of the n inputs to the perceptron must be true. However, some Boolean functions cannot be represented by a single perceptron, such as the XOR function (Figure 1).
Figure 1 – The decision surface represented by a two-input
perceptron. x1 and x2 are the perceptron inputs. (a) A set of training examples and the decision surface of a perceptron that
classifies them correctly. (b) A set of training examples that is not linearly separable
THE PERCEPTRON TRAINING RULE
The precise learning problem is to determine a weight vector that causes the perceptron to produce the correct +1, –1 output for each of the given training examples.
7
One way to learn an acceptable weight vector is: 1) to begin with random weights; 2) then iteratively apply the perceptron to each training
example; 3) modifying the perceptron weights whenever it
misclassifies an example; 4) this process is repeated until the perceptron classifies
all training examples correctly. Weights are modified at each step according to their
perceptron training rule, which revises the weight wi associated with input xi :
GRADIENT DESCENT AND THE DELTA RULE
Although the perceptron rule finds a successful weight vector when the training examples are linearly separable, it can fail to converge if the examples are not linearly separable.
Gradient descent searches the hypothesis space of possible weight vectors, even in nonlinear training examples, to find the weights that best fit the training examples.
Training error is the difference between target and output. Mathematically it is defined as follows:
where
D is the set of training examples; td is the target output for the training example d; od is the output of the linear unit for training example d. Gradient descent algorithms are algorithms that search
the steepest descent along the error space. It determines a
8
weight vector that minimizes E by starting with an arbitrary initial weight vector, then repeatedly modifying it in small steps.
Linear units have a single global minimum in this error surface. Gradient descent algorithms continue searching process until the global minimum error is reached.
Each training example is a pair of the form (x, t), where x is the vector of input values, and t is the target output value. By Gradient rule we get these results:
From here, we can know that each unit weight w is redefined by the error value between target and output and also by learning rate.
CREATING AND TRAINING A PERCEPTRON (C++)
Let’s look at the logic table 1 for the x1 AND x2: Table 1 – Logic table
x1 x2 x1 AND x2
0 0 0
0 1 0
1 0 0
1 1 1
We can see that a neuron output is equal to 1 when both
inputs are activated. Let’s use a threshold of 0 (simple and convenient!) and set the inputs to –1 and 1.
The main function (C++) for training of the perceptron: void main()
9
{ int x0=1,d=1,w0=0,w1=0,w2=0,i,j,yin,net; int x1[4], x2[4], t[4]; int x,y=4; char b[5]; i=0; printf("Enter The Truth Table For the AND Gate\n"); printf("x1 \t x2 \t t \n"); for(j=0;j<4;j++) scanf("%d%d%",&x1[j],&x2[j],&t[j]); for(j=0;j<4;j++) { if(x1[j]==0) x1[j]=–1; if(x2[j]==0) x2[j]=–1; if(t[j]==0) t[j]=–1; } yin= w0+((x1[0]*w1)+(x2[0]*w2)); yin=yin>0?1:(yin==0?0:–1); printf("\n yin=%d \n", yin); while((yin!=t[i])&&(i<4)) { printf(" \nt=%d",t[i]); w0=w0+d*t[i]*x0; printf("\nw0=w0+d*t*x0=%d",w0); w1=w1+d*t[i]*x1[i]; printf("\nw1=w1+d*t*x1=%d",w1); w2=w2+d*t[i]*x2[i]; printf("\nw2=w2+d*t*x2=%d",w2); printf("\nNew Matrix of weights is {%d %d
%d}",w0,w1,w2); i++; yin= w0+((x1[i]*w1)+(x2[i]*w2)); yin=yin>0?1:(yin==0?0:–1); printf("\n yin=%d",yin);
10
} printf(" t=%d",t[i]); if(yin!=t[i]) printf("\nPerceptron can't be trainined"); else{ printf("\n x1 \t x2 \t NET"); for(i=0;i<=3;i++) { net= (x0*w0)+(x1[i]*w1)+(x2[i]*w2); net=net<0?0:1; if(x1[i]==–1) x1[i]=0; if(x2[i]==–1) x2[i]=0; printf("\n %d \t %d \t %d",x1[i],x2[i],net); } printf("\nPerceptron is trainied sucessfully"); } }
CREATING AND TRAINING PERCEPTRON WITH THE NNTOOL
We will now create and train a perceptron to recognize the following function:
Table 2 – Logic table
x1 x2 function (OR)
0 0 0
0 1 1
1 0 1
1 1 1
The match between input pattern and output pattern is
given by the following: Input pattern: [0 0 1 1; 0 1 0 1]; Matching output pattern: [0 1 1 1].
11
Our first job is to get the information into MATLAB via the command window.
Type p = [0 0 1 1; 0 1 0 1] t = [0 1 1 1] in the command window. Open the Neural Network toolbox [matlab start >
toolboxes > neural network > NNTool] and click on NNTool. You can now see a GUI which will allow you to set up a network – in our case a perceptron. We want to import data from the workspace so click the import button in the Networks and data box. You are then given a choice of where to import from – we want the workspace. Select p and import these values as inputs. Now repeat to import t as targets. When done you return to the GUI and see p and t in the correct panes.
Click on “new network”. Leave the name as it is but choose perceptron from the drop down list as the network type and create a perceptron network. Get the input range from input p and leave the other values in the GUI as they are. Click to “create”. Back at the NNTool GUI select your network in the network pane and then click on the “adapt” button in the networks box. You need to set the inputs and outputs in the window (p, t respectively) and then set the adapt parameters (how many passes through the data – leave at 1 for now). Then clicking “adapt network” will make the network follow the adapt rule when altering weights and biases. Go back to the manager and view the output and error values to see if the training has worked. If the training has not worked try "adapt network" again. How many times do you have to run through the data to get the network to recognize the patterns? By doing 3 passes at a time you could have shortened the process – remember this is a controllable parameter when you construct a network.
12
Now that we have a network that works we might want to keep it – we don’t want to create it every time. How can we do this? Export it to the workspace as a first step. Highlight the network in the network manager and then click the “export” button – then highlight again and click “export” again.
Now that we have the network in the workspace we can check that it works directly by using the sim function. The sim function evaluates the effect of a network on a set of input data – we will use it frequently when we have a trained network to calculate the network output with new data.
Type sim(network1,p) in the command window and check the result against t.
Now if we save the workspace we keep the network – do this.
If you click the workspace tab in the top left hand pane of the MATLAB window you can see all your variables. These can be inspected by double clicking on them. Click on each in turn to make sure that they contain what you expect.
We can access the internal pieces of this network:
w=network1.IW{1,1} b=network1.b{1} Now we can compute with the matrix values that are
there – but first notice how many rows there are in w and compare to the number of neurons in the perceptron layer. Compare the number of columns in w with the input size.
Check that hardlim(w*[1;2] +b) sim(network1,[1;2]) have the same output as each other.
13
Let us get some more data into our workspace and create a new network to make sure we have the complete idea.
We will choose input vectors of size 3 p= [1 2 3 4 5; 2 3 4 5 6; –4 –4 –5 –10 7]
and outputs of size 2 (so we need a perceptron layer with two perceptrons when we create the network)
t= [0 1 1 0 1; 1 0 0 1 0] Delete all the old information in the network manager and
import the new p and t values. Create and train a neural network (network2) which learns this input-output pattern (remember to create a perceptron network and that you need 2 neurons)
View your network and look at the matrix values associated with it. Again check the relationship between the number of rows in w and the number of neurons in the network and the number of columns in w and the size of the input vector.
w=network1.IW{1,1} b=network1.b{1} Now compare hardlim(w*p(:,1) +b) with sim(network1,p(:,1)) Check that network1 computes the same value on p(:,x) as
hardlim(w*p(:,x) +b)) does for each column of p.
14
USING MATLAB FOR CLASSIFICATION OF LINEARLY SEPARABLE DATA
CLASSIFICATION OF A 2-CLASS PROBLEM WITH A PERCEPTRON
Two clusters of data, belonging to two classes, are defined
in a 2-dimensional input space. Classes are linearly separable. The task is to construct a Perceptron for the classification of data.
Define input and output data % number of samples of each class N = 20; % define inputs and outputs offset = 3; % offset for second class x = [randn(2,N) randn(2,N)+offset]; % inputs y = [zeros(1,N) ones(1,N)]; % outputs % Plot input samples with PLOTPV %(Plot perceptron input/target vectors) figure(1) plotpv(x,y); Creation and training perceptron: net = newp(x,y); net = train(net,x,y); Plot decision boundary figure(1) plotpc(net.IW{1},net.b{1}); The result is on the Figure 2.
15
CLASSIFICATION OF A 4-CLASS PROBLEM WITH A PERCEPTRON
Perceptron network with 2-inputs and 2-outputs is
trained to classify input vectors into 4 categories. Data defining: % number of samples of each class K = 30; % define classes off = .7; % offset of classes cl1 = [rand(1,K)–off; rand(1,K)+off]; cl2 = [rand(1,K)+off; rand(1,K)+off]; cl3 = [rand(1,K)+off; rand(1,K)–off]; cl4 = [rand(1,K)–off; rand(1,K)–off]; % plot classes plot(A(1,:),A(2,:),'bs') hold on plot(B(1,:),B(2,:),'g+') plot(C(1,:),C(2,:),'ro') plot(D(1,:),D(2,:),'m*') % text labels for classes text(.5–off,.5+2*off,'Class 1') text(.5+off,.5+2*off,'Class 2') text(.5+off,.5–2*off,'Class 3') text(.5–off,.5–2*off,'Class 4') % define output coding for classes class1 = [0 1]'; class2 = [1 1]'; class3 = [1 0]'; class4 = [0 0]';
16
PREPARE INPUTS & OUTPUTS FOR PERCEPTRON TRAINING
% define inputs (combine samples from all four classes) P = [cl1 cl2 cl3 cl4]; % define targets T = [repmat(class1,1,length(cl1))
repmat(class2,1,length(cl2)) ... repmat(class3,1,length(cl3)) repmat(class4,1,length(cl4)) ];
CREATION AND TRAINING PERCEPTRON
net = newp(P,T); ADAPT returns a new network object that performs as a
better classifier, the network output, and the error. This loop allows the network to adapt for xx passes, plots the classification line, and continues until the error is zero.
E = 1; net.adaptParam.passes = 1; linehandle = plotpc(net.IW{1},net.b{1}); n = 0; while (sse(E) & n<900)
n = n+1; [net,Y,E] = adapt(net,P,T); linehandle = plotpc(net.IW{1},net.b{1},linehandle); drawnow;
end Perceptron simulation experiment: % For example, classify an input vector of [0.7; 1.2] p = [0.6; 1.1] y = sim(net,p) % compare response with output coding
17
p = 0.7000 1.2000 y = 1 1
Figure 2 – Classification of a 2-class problem with a perceptron
Figure 3 – Classification of a 4-class problem with a perceptron
18
APPROXIMATION OF FUNCTIONS BY NEURAL NETWORKS
The so called function approximation, is to find a mapping
f1 satisfying ||f1(x) – f(x)||<e, (e is the tolerance; ||·|| can be any error measurement). In general, it is enough to have a single layer of nonlinear neurons in a neural network in order to approximate a nonlinear function. The goal of this work is to build a feedforward neural network that approximates the following function:
2cos( 2 )Z x y xy .
DATA PREPARATION
For this function approximation problem, three kinds of data sets are prepared, namely the training set, the validation set and the test set. The training set is a set of value pairs which comprise information about the target function for training the network. The validation set is associated with the early stopping technique. During the training phase, the validation error is monitored in order to prevent the network from overfitting the training data. Normally, the test set is just used to evaluate the network performance afterwards. But, in this exercise the root mean-square error on the test set is used as the performance goal of the network training.
For the current problem, the training and the test data are taken from uniform grids (10 x 10 pairs of values for the training data, 9 x 9 pairs for the test data). So, it is not necessary to scale the target function. For the validation data, in order to make it a better representation of the original function, it is taken randomly from the function surface.
19
The function for generating data: function [train_input,train_target,test_input,test_target,val_input,
val_target] = generate_data() train_x = –1:2/9:1; train_y = train_x; % training data test_x = (–1+1/9):2/9:(1–1/9); % test data [–1 1] test_y = test_x; % test data val_x = premnmx(rand(1,50)); % validation data [–1 1] val_y = val_x; % validation data [train_X, train_Y] = meshgrid(train_x, train_y); [test_X, test_Y] = meshgrid(test_x, test_y); [val_X, val_Y] = meshgrid(val_x, val_y); % functin output is within [–0.9 0.9],so no need to sacle % the function train_Z = cos(train_X + 2*train_Y) +train_X.*train_Y.^2; % training target test_Z = cos(test_X + 2*test_Y) + test_X.*test_Y.^2; % test target val_Z = cos(val_X + 2*val_Y) + val_X.*val_Y.^2; % validation target % plot the function [X,Y] = meshgrid(–1:.2:1,–1:.2:1); Z = cos(X + 2*Y) + X.*Y.^2; figure, subplot(1,2,1); surfc(X,Y,Z); % plot parametric surface % Return inputs [–1 1] and outputs[–0.8 0.8] train_input = [train_X(:)'; train_Y(:)']; train_target = train_Z(:)'; test_input = [test_X(:)'; test_Y(:)']; test_target = test_Z(:)'; val_input = [val_X(:)'; val_Y(:)']; val_target = val_Z(:)';
20
NETWORK DESIGN
Theoretical results indicate that given enough hidden units, a feedforward neural network can approximate any non-linear functions (with a finite number of discontinuities) to a required degree of accuracy. In other words, any non-linear function can be expressed as a linear combination of non-linear basis functions. Therefore, a two-layer feedforward neural network with one layer of non-linear hidden neurons and one linear output neuron seems a reasonable design for a function approximation task. The target function as defined above has two inputs (x, y), and one output (z = f(x,y)). Thus, the network solution consists of two inputs, one layer of tansig (Tan-Sigmoid transfer function) neurons and one purelin (linear transfer function) output neuron.
The number of the hidden neurons is an important design issue. On the one hand, having more hidden neurons allows the network to approximate functions of greater complexity. But, as a result of network’s high degree of freedom, it may overfit the training data while the unseen data will be poorly fit to the desired function. On the other hand, although a small network won’t have enough power to overfit the training data, it may be too small to adequately represent the target function.
The function for creation of network function net = create_network() num_h = getInput('Size of the hidden layer[8] –> ',8); transFcn_h = getInput('Transfer function of the hidden layer[tansig]–> ','tansig','s'); transFcn_o = getInput('Transfer function of the output layer[purelin]–> ','purelin','s'); % create the network based on the user's choice net=newff([–1 1; –1,1],[num_h 1],
{transFcn_h,transFcn_o});
21
NETWORK TRAINING
In general, we can train a network in two kinds of styles: batch training or incremental training. In batch training, weights and biases of the network are only updated after all of the inputs are presented to the network, while in incremental (on-line) training the network parameters are updated each time an input is presented to it. The batch training is supposed to work faster and reasonably well on a static network.
There is a number of batch training algorithms which can be used to train a network. In this exercise, the following four training algorithms are examined.
trainbfg implements BFGS (Shanno) quasi-Newton algorithm, which is based on the Newton’s method. Generally, it converges in a few iterations. However for very large networks trainbfg may not be a good choice because of its computation and memory overhead. For small networks, however, trainbfg can still be an efficient training function.
traingd implements a basic gradient descent algorithm. It updates weights and biases in the direction of the negative gradient of the performance function. The mayor drawback of traingd is that it is relatively slow (especially when the learning rate is small) and has a tendency to get trapped in local minima of the error surface (where the gradient is zero.).
traingdm improves traingd by using momentum during the training. Momentum allows a network to ignore the shallow local minimum of the error surface. In addition, traingdm often provides a faster convergence than traingd.
trainlm implements the Levenberg-Marquardt algorithm, which works in such a way that performance function will always be reduced at each iteration of the algorithm. This feature makes trainlm the fastest training algorithm for networks of moderate size. Similar to trainbfg, trainlm suffers from the
22
memory and computation overhead caused by the calculation of the approximated Hessian matrix and the gradient.
In order to examine the performance of the training functions mentioned above, they are applied to the two-layer feedforward network respectively with the performance goal (MSE = 0.02 for the training set), maximum number of epochs to train (100) and the learning rate (0.02) being the same (without using early stopping). Within 100 epochs, trainbfg and trainlm achieve the performance goal while traingd and traindm fail. As it turned out the Trainbfg and trainlm spend more time in each epoch than the gradient descent algorithms, which is the result of their computation overhead. Although more time is spent in each epoch, the total time spent by trainbfg and trainlm to reach the goal is less.
If the size of the network is too large it may run a risk of overfitting the training set and loses its generalization ability for unseen data. One method for improving network generalization ability is to use a network that is just large enough to provide an adequate fit to the target function. But sometimes it is hard to know beforehand how large a network should be for a specific application. One commonly used technique for improving network generalization is early stopping. This technique monitors the error on a subset of the data (validation data) that does not actually take part in the training. The training stops when the error on the validation data increases for a certain amount of iterations.
In order to examine the effect of early stopping on the training process, a randomly generated validation set is used during the trainlm training (maximum validation failures = 10, Erms = 0.02 for the test set). As it turned out the early stopping mechanism is not triggered during the training. That is because the validation error keeps decreasing during the whole training process. Both networks (trained with and without early
23
stopping) work equally well on the current approximation problem.
Normally, the test set doesn’t take part in the training process. However, in this exercise, it is required that the network should be trained until Erms = 0.02 for the test set. The training process then goes as followings. Initially, the performance goal for the training set is set to be a relatively large value (MSE = 0.02). Then, after each training process, the network is simulated and Erms on the test set is monitored. If Erms is larger than 0.02, the training is resumed for a lower performance goal for the training set (e. g. decreases by a factor of 0.5). Otherwise, the training stops. In current Matlab program, the performance of the trained network is evaluated by using the test set. Actually, it may introduce some bias on the result, because the test set is virtually used in the training phase. So, it would be better, if some other randomly generated data can be used for testing the network performance.
The function for training the network
function [error,network_output] = train_network( net,train_input,train_target,test_input,test_target,val_input,val_target)
val.P = val_input; val.T = val_target; test.P = test_input; test.T = test_target; % ask the user for the training parameters epoch = round( getInput('Maximum number of epochs to
train [5000]: ', 5000)); % maximum number of epochs to train Lr = getInput('Learning rate [.02]: ', .02); % learning rate trainFcn= getInput('Training function [trainlm]–>
','trainlm','s'); % training function (Automated Regularization (trainbr)) net.trainFcn = trainFcn;
24
net.trainParam.lr = Lr; net.trainParam.epochs = epoch; net.trainParam.show = 40; % Epochs between displays net.trainParam.goal = 0.02; % Mean-squared error goal stop_crit = getInput('Use early stopping ? y/n [n]:', 'n', 's'); erms = 1; % Training... if(stop_crit=='n')% no stop criteria tic, % start a stopwatch timer. while erms > 0.02 net = train(net,train_input,train_target,[],[],[],test); network_output = sim(net,test_input); error = test_target – network_output; erms = sqrt(mse(error)) % root mean-square error net.trainParam.goal = net.trainParam.goal*0.5; end toc; % prints the elapsed time since tic was used else % use early stopping tic, net.trainParam.max_fail = getInput('Maximum validation failures [10]:', 10); while erms > 0.02 net = train(net,train_input,train_target,[],[],val,test); network_output = sim(net,test_input); error = test_target – network_output; erms = sqrt(mse(error)) % root mean-square error net.trainParam.goal = net.trainParam.goal*0.5; end toc; end
25
NETWORK TESTING
After the training process, the performance of the trained network will be evaluated by applying unseen data to it and checking whether its outputs are still relevant to the targets. We can use Matlab routine postreg to measure the network performance, which implements a regression analysis between the network response and the corresponding targets.
The function to create displays of function surface and level curves function plot_result (net,input,target,network_output,error)
X = reshape(input(1,:),9,9); Y = reshape(input(2,:),9,9); Z = reshape(target,9,9); No = reshape(network_output,9,9); E = reshape(error,9,9); % plot function surface figure, subplot(1,2,1); surfc(X,Y,Z); xlabel('X'); ylabel('Y'); zlabel('Z'); title('Target Function Surface'); subplot(1,2,2); surfc(X,Y,No); title('Approximated Function Surface'); % plof level curves... % create level curves of error figure, [C,h] = contour(X, Y, E); clabel(C,h); title('level courve of the error') figure,
26
[C,h1] = contour(X, Y, Z,'k'); % create level curve of target set(h1,'LineWidth',2); % clabel(C,h); hold on [C,h2] = contour(X, Y, No,'m'); % create level curve of approximation set(h2,'LineWidth',2); hold off legend([h1(1);h2(1)],'target','approximation'); title('level courves of the target and approximation
functions') % M – Slope of the best linear regression.M=1 means perfect
fit. % B – Y intercept of the best linear regression.B=0 means
perfect fit. % R – Regression R-value. R=1 means perfect correlation. figure, %create a new figure for displaying the performance [M,B,R] = postreg(network_output,target); % check the quality of the network training fprintf('\n\tThe slope of the best linear regression[1]: %6.5f\n',M); fprintf('\tThe Y intercept of the best linear regression[0]: %6.5f\n',B); fprintf('\tThe coorelation between the network output and the target[1]: %6.5f\n',R); [train_input,train_target,test_input,test_target,val_input,
val_target] =generate_data; net = create_network; [error,network_output]=train_network( net,train_input,train_target,test_input,test_target,
val_input,val_target); plot_result(test_input,test_target,network_output,error);
27
CONCLUSION
A two-layer network with two inputs, eight tansig hidden units and one purelin output unit is built for the approximation problem mentioned above. The network is trained by trainlm until the performance goal Erms = 0.02 is achieved for the test set. No early stopping is used during the training. The maximum number of epochs to train and the learning rate are set to be 5000 and 0.02 respectively.
28
FUNCTION APPROXIMATION WITH RBFN
STRUCTURE OF RBF NEURAL NETWORKS
In multi-layer perceptrons, the hidden neurons are based on linear basis function (LBF) nodes. Another type of hidden neurons is the radial basis function (RBF) neurons, which is the building block of the RBF neural networks. In an RBF network, each neuron in the hidden layer is composed of a radial basis function that also serves as an activation function. The weighting parameters in an RBF network are the centres and the widths of these neurons. The output functions are the linear combination of these radial basis functions.
Figure 4 –Structure of RBF neural networks
)(tx
Group 1
Group K
…..
))((1 tx ))((2 tx ))((1 tM x))(( tM x
(D-dimensional vector)
1 1
10 0K
1112
M1 1K
2KKM
…….
))((1 ty x ))((2 ty x
29
A more general form of the RBF networks is the elliptical basis function (EBF) networks where the hidden neurons compute the Mahalanobis distance between the centers and the input vectors. It has been shown that RBF networks have the same asymptotic approximation power as multi-layer perceptrons.
To apply RBF/EBF networks for pattern classification, each class is assigned a group of hidden units, and each group is trained independently using the data from the corresponding class. Figure 4 depicts the architecture of an RBF/EBF network with D inputs, M basis functions (hidden nodes), and K outputs. The input layer distributes the D-dimensional input patterns, xt, to the hidden layer. Each hidden unit is a Gaussian basis function of the form
11( ) exp ( ) ( ) , ,...,
2
T
j t t j j t j
j
j 1 M
x x x ,
where j and j are the mean vector and covariance matrix of
the j-th basis function respectively, and j is a smoothing
parameter controlling the spread of the j-th basis function. The k-th output is a linear weighted sum of the basis functions’ output, i.e.
01
( ) ( )M
k t k kj j tj
y
x x , 1, ,t N and 1, ,k K ,
where tx is the tth input vector and 0k is a bias term.
In matrix form, last equation can be written as Υ ΦWwhere Y is an N K matrix, Φ an ( 1)N M matrix,
and W is an ( 1)M K matrix. The weight matrix W is the least
30
squares solution of the matrix equation
ΦW D,
where D is an N K target matrix containing the desired output
vectors in the rows. As Φ is not a square matrix, one reliable way to solve the last equation is to use the technique of singular value decomposition. In this approach, the matrix Φ is decomposed into the product U VT, where U is an ( 1)N M
column-orthogonal matrix, is an ( 1) ( 1)M M diagonal
matrix containing the singular values, and V is an
( 1) ( 1)M M orthogonal matrix. The weight vectors 1
K
k kw
are given by
1V UT
k kd w ,
where kd is the kth column of D. For an over-determined system,
singular value decomposition gives a solution that is the best approximation in the least squares sense.
EXAMPLE: APPROXIMATION WITH RBF
Create a function approximation model based on a measured data set. Apply various Neural Network architectures based on Radial Basis Functions.
% data generator X = 0:.1:40; %f = abs(besselj(2,X*7).*asind(X/2) + (X.^1.95)) + 2; f = sin(X)/10+2*exp(–X); fig = figure; plot(X,f,'b–')
31
hold on grid on % available data points Ytrain = f + (rand(1,length(f))–.5)/10; Xtrain = X([20:100 200:300]); Ytrain = Ytrain([20:100 200:300]); plot(Xtrain,Ytrain,'kx') xlabel('x') ylabel('y') ylim([–0.2 1]) %--------------------------------- % choose a spread constant spread = .2;
Figure 5 – The results of computer simulations
32
% choose max number of neurons K = 50; % performance goal (SSE) goal = 0; % number of neurons to add between displays Ki = 5; % create a neural network net = newrb(Xtrain,Ytrain,goal,spread,K,Ki); %--------------------------------- % simulate a network over complete input range Y = sim(net,X); % plot network response figure(fig) plot(X,Y,'r') legend('original function','available data','RBFN')
33
PATTERN RECOGNITION WITH NEURAL NETWORKS
The task is to build a multilayer feedforward network for
pattern recognition. The network is trained as a character classifier for a collection of characters given as 7 x 5 black- white pixel maps. Ideally, the trained network can recognize characters it has learnt even when some of them are distorted.
DATA PREPARATION
In general, there are two kinds of data prepared for training and testing the network. One is the collection of thirty-one 35-element input vectors, which represent the target patterns: 26 capital characters. Another part of the data is collected by randomly reversing three bits of original characters. This time, instead of using early stopping to improve the generalization ability of the network, the network is trained on both parts of the data mentioned above, which enables its response correctly to both ideal and partially corrupted patterns.
The function for generation chars:
function [alphabet,targets] = generate_chars() % GENERATE_CHARS – Create target patterns % % Returns: % alphabet - 35x31 matrix of 5x7 bit maps for each letter. % targets – 31x31 target vectors. [alphabet,targets] = prprob; % capital characters targets = eye(26); % show the image of alphabet figure;
34
for i=1:size(alphabet,2) subplot(4,8,i); imagesc(reshape(alphabet(:,i),5,7)',[0,1]); axis off; end
The target patterns:
The function for generation of distorted chars
function noisy_alphabet = generate_charsn(alphabet,noise_level) % GENERATE_CHARSN – Create distorted patterns % % Arguments: % alphabet – 35x31 matrix of 5x7 bit maps for each letter. % noise_level – Number of bits which will be changed . % Returns: % noisy_alphabet – Alphabet with noise % add noise to the original alphabet noisy_alphabet = alphabet; if noise_level~=0 size_image = length(alphabet(:,1));
35
% choose noise_level amount of random positions for each % letter matrix for i=1:size(alphabet,2) R(i,:) = round(rand(1,noise_level)*(size_image–1)+1)+(i– 1)*(size_image); while length(unique(R(1,:)))< noise_level % prevent same random %numbers to be generated R(i,:) = round(rand(1,noise_level)*(size_image–1)+1)+
(i–1)*(size_image); end end % randomly change noise_level number of bits in each letter % image : %0–>1 and 1–>0 noisy_alphabet(R) = imcomplement(alphabet(R)); end
The distorted patterns:
36
NETWORK DESIGN
In principle, two-layer networks with sigmoidal hidden units can approximate arbitrarily well any functional continuous mapping from one finite-dimensional space to another, provided the number of hidden units is sufficiently large. As the target patterns are relatively simple, which are defined by only 35 Boolean values. Therefore, a two-layer feedforward network is supposed to be power enough for this character recognition task. As 26 target characters are represented by 35-element input vectors, the neural network needs 35 input and 26 output neurons. The network receives 35 Boolean values, that represent one character. It is then required to identify the character by giving an output vector, the element of which with the highest value indicates the class of input character. The logsig (Log Sigmoid) is chosen as the transfer function for both hidden and output layers. This is because it has a suitable output range ([0 1]) for the current problem. The number of hidden neurons is set to be 15.
Another important design issue is the choice of the initial weights and bias. In general, weights and bias should be initialized to small values so that the active region of each neuron is not close to the irresponsive (saturate) part of the transfer function; otherwise the network won’t be able to learn. When using the Matlab routine newff to create a network, each layer's weights and biases are initialized automatically. In the program, the automatically created layer weights from the hidden layer to the output layer and the bias of the output layer are scaled down by a factor of 0.01.
The function for creation a feed-forward backpropagation network with one hidden layer:
function net = create_network(input,target) % Arguments:
37
% input – Network inputs. % target – Target value. % % Returns: % net – Network object created % [S2,Q] = size(target); % ask the user for the network parameters % create the network based on the user input net = newff(minmax(input),[15 26],{'logsig' 'logsig'}); % scale down weights and bias net.LW{2,1} = net.LW{2,1}*0.01; net.b{2} = net.b{2}*0.01;
NETWORK TRAINING
After the network is created, it is then ready for training. A gradient decent training function with momentum and adaptive learning rate (traingdx) is chosen to train the network. For the pattern recognition task, it is important that the noisy patterns can still be correctly classified. Thus, in order to make the network insensitive to the presence of noise, it is trained on not only ideal patterns but also noisy patterns. In the program, a three-step training process is implemented. In the first step, the network is trained on the ideal data for zero decision errors. In the second step, the network is trained on noisy data for several passes for a proper performance goal (0.01 is used in the program). Unfortunately, after the network is trained for recognizing noisy patterns, it will probably “forget” those noise-free patterns it has learnt before. Therefore, in order to recall the network of these non-distorted characters, in the final step, it is trained again on just ideal data for zero decision errors. The three-step training process mentioned above enables the trained
38
network to identify both noise-free and noisy characters (within certain error tolerance).
function [net,netn] = train_network( net,input,target)
% Arguments: % net – Neural network. % input – Input matrix. % target – Desired output matrix. % Returns: % net – New network trained by input % netn – New network trained by noisy_input net.trainFcn = 'traingdx'; net.trainParam.epochs = 5000; net.trainParam.show = 40; % Epochs between displays net.trainParam.goal = 0; % Mean-squared error goal net.trainParam.mc = 0.95; % Momentum constant. % Training... % 1: train a network without noise [net,tr] = train(net,input,target); fprintf ('Strike any key to train the network with noise...\n'); pause % A copy of the network will now be made. This copy will % be trained with noisy examples of letters of the alphabet. netn = net; % 2: train another network with noise % netn will be trained on all sets of noisy letters netn.trainParam.goal = 0.01; for pass = 1:20 % create noisy input by distorting 3 bits % of every original character matrix noisy_input = generate_charsn(input,3); [netn,tr] = train(netn,noisy_input,target); end % netn is now retrained without noise to
39
% insure that it correctly categorizes non-noisy letters. netn.trainParam.goal = 0; [netn,tr] = train(netn,input,target);
NETWORK TESTING
Once the network is trained, the test data which consist of
both noise-free and slightly distorted patterns are fed to the network to check the training result. Here, the average recognition error rate is used as the performance measure.
function [error,errorn,noise_range,noisy_input,outputn] = test_network(net,netn,alphabet,targets)
% TEST_NETWORK – Evaluate the performance of the % trained network by % average errors. % % Arguments: % alphabet – 35x31 matrix of 5x7 bit maps for each letter. % targets – Target value % Returns: % error – Average error of the network trained without noise % errorn – Average error of the network trained with noise % noise_range – Noise levels % noisy_input – Distorted patterns with the highest noise % level % outputn – Output given by netn % % SET TESTING PARAMETERS noise_range = 0:3; max_test = 100;
40
error = []; errorn = []; T = targets; % PERFORM THE TEST for noise_level = noise_range fprintf('Testing networks with %d bits of noise\n',noise_level); e = 0; en = 0; for i=1:max_test
P = generate_charsn(alphabet,noise_level); noisy_input = P; % TEST NETWORK WITHOUT NOISE A = sim(net,P); AA = compet(A); e = e + sum(sum(abs(AA–T)))/2; % TEST NETWORK WITH NOISE An = sim(netn,P); AAn = compet(An); en = en + sum(sum(abs(AAn–T)))/2;
end % AVERAGE ERRORS FOR max_test SETS OF ALL TARGET %VECTORS. error = [error e/size(T,2)/max_test] errorn = [errorn en/size(T,2)/max_test] end % output of netn when input patterns are distorted with the % highest noise_level result = full(AAn); outputn = alphabet; for i = 1:size(result,2) index = find(result(:,i)); outputn(:,i) = alphabet(:,index); end
41
DRAWING THE RESULTS
function plot_result(error,errorn,noise_range,noisy_input, outputn )
% Arguments: % error – Average error of the network trained without noise % errorn – Average error of the network trained with noise % noise_range – Noise levels % noisy_input – Distorded patterns with the highest noise % level % outputn – Output given by netn % Here is a plot showing the percentage of errors for % the two networks for varying levels of noise. figure, plot(noise_range,error*100,'––k',noise_range,errorn*100,'r',
'LineWidth',2); xlabel('Noise Level'); ylabel('Percentage of Recognition Errors'); legend('trained without noise','trained with noise'); % give a plot of noisy inputs and outputs % given by the network trained on noisy data figure, for i=1:size(noisy_input,2) subplot(4,8,i); colormap('summer'); imagesc(reshape(noisy_input(:,i),5,7)',[0,1]); axis off; end figure for i=1:size(outputn,2)
subplot(4,8,i); colormap('summer'); imagesc(reshape(outputn(:,i),5,7)',[0,1]); axis off; end
42
HOPFIELD NEURAL NETWORK WITH IMPLEMENTATION IN MATLAB AND C
THE HOPFIELD MODEL
Figure 6 – Structure of Hopfield neural network
A fully connected network with binary (0/1, or +1/–1) inputs and outputs.
Symmetrically weighted (wij = wji).
Nodes perform weighted sum with a hard limiting (step) transfer function.
Output of each node fed back to the others.
Input applied to all nodes simultaneously and the network left to stabilize.
43
Outputs from the nodes in the stable state form the output of the network.
When presented with an input pattern, it outputs a stored pattern nearest to the presented pattern.
Good as content addressable memory and for solving optimization problems.
OPERATION OF THE HOPFIELD NETWORK
The Hopfield network has no learning algorithm as such. Patterns (or facts) are simply stored by setting weights to lower the network energy.
The teaching stage: The connection weights are set using the exemplar
patterns from all classes according to the equation
1
0
, ,
0, , 0 , 1
M
is jssij
x x i jw
i j i j N
where wij is the connection weight between node i and node j, xis (either +1 or –1) is element i of the exemplar pattern for class s , and M is the number of pattern classes.
The result of the teaching stage is the association of a pattern with itself.
The recognition stage: The output of the net is forced to match that of the
imposed unknown pattern.
i(0) = xi, 0 i N – 1,
where i(t) is the output of node i at time t.
44
The net is then allowed to iterate freely in discrete time steps until it converges (the output no longer changes).
1
0
( 1) ( ) ,N
i ij jj
t f w t i j
.
The transfer function f is the step function.
The autoassociation of patterns means that presentation of a corrupt or incomplete input pattern will result in the reproduction of the original pattern as output. The network thus works as a content addressable memory (CAM).
DESIGNING AND TRAINING THE HOPFIELD NET. IMPLEMENTATION IN C
The above algorithm for designing and training the HOPFIELD NET is used in the program:
void main() { int a[5][5], at[5][5],w[5][5],n,i,j,k,x[5][5],y[5][5],yin; clrscr(); n=5; // the amount of numbers in the Stored vector; printf("Enter the Stored vector :"); for(i=0;i<n;i++) scanf("%d",&a[0][i]); for(i=0;i<n;i++) { if(a[0][i]==0) a[0][i]=–1; } for(i=0;i<n;i++) at[i][0]=a[0][i];
45
for(i=0;i<n;i++) { for(j=0;j<n;j++) { w[i][j]=0; w[i][j]+=at[i][0]*a[0][j]; } } for(i=0;i<n;i++) w[i][i]=0; // initializing the w[i][i]=0 printf("The weight Matric is "); for(i=0;i<n;i++) { printf("\n"); for(j=0;j<n;j++) printf("\t%d",w[i][j]); } printf("\nEnter the New vector :"); for(i=0;i<n;i++) { scanf("%d",&x[0][i]); y[0][i]=x[0][i]; } for(i=0;i<n;i++) { for(j=0;j<n;j++) yin+=(y[0][j]*w[j][i]); yin+=x[0][i]; yin=yin<=0?0:1; y[0][i]=yin; } printf("The value of the new vector should be : "); for(i=0;i<n;i++) printf(" %d",y[0][i]);}
46
DESIGNING OF A HOPFIELD NETWORK. IMPLEMENTATION IN MATLAB
A Hopfield network can be created in Matlab by using the function newhop(data). The network functioning is simulated using the function sim. There are two variants of calling the function sim:
result= sim(net,M,[],test) or result= sim(net,{M,iterations}, {},test) where M is the number of test data to be taken from the
test matrix (specified as the last parameter). In the first variant the user does not control the number of iterations while in the second case he can do this.
The program to design a Hopfield network which stores 4
vectors is:
vectors=[–1 1 –1 –1 1 –1 –1 1 –1; –1 –1 –1 1 1 1 –1 –1 –1; –1 –1 1 –1 1 –1 1 –1 –1; 1 –1 –1 –1 1 –1 –1 –1 1]';
net=newhop(vectors); result=sim(net,4,[],vectors); disp('Stored vectors:'); disp(vectors); disp('Fixed points:'); disp(result); % Dest data test={[0.1; 0.8; –1; –0.7; 0.5; –1; –0.9; 0.85; –1]}; result=sim(net,{1,5},{},test); % Network state after each iteration for i=1:5, disp(sprintf('Network state after %d iterations:',i)); disp(result{i}); end
47
COMPETITIVE NETWORKS – THE KOHONEN SELF-ORGANISING MAP
ARCHITECTURE OF THE KOHONEN NETWORK
The Kohonen network consists of an input layer, which distributes the inputs to each node in a second layer, the so-called competitive layer. Each of the nodes on this layer acts as an output node. Each neuron in the competitive layer is connected to other neurons in its neighbourhood and feedback is restricted to neighbours through these lateral connections. Neurons in the competitive layer have excitatory connections to immediate neighbours and inhibitory connections to more distant neurons.
Figure 7 – Structure of Kohonen network
48
All neurons in the competitive layer receive a mixture of excitatory and inhibitory signals from the input layer neurons and from other competitive layer neurons.
THE KOHONEN NETWORK IN OPERATION
As an input pattern is presented, some of the neurons are sufficiently activated to produce outputs which are fed back to other neurons in their neighbourhoods. The node with the weight vector closest to the input pattern vector (the so-called “winning node”) produces the largest output. During training, input weights of the winning neuron and its neighbours are adjusted to make them resemble the input pattern even more closely. At the completion of training, the winning node ends up with its weight vector aligned with the input pattern and produces the strongest output whenever that particular pattern is presented. The nodes in the winning node’s neighbourhood also have their weights modified to settle down to an average representation of that pattern class. As a result, unseen patterns belonging to that class are also classified correctly (generalization). The m neighbourhoods, corresponding to the m possible pattern classes are said to form a topological map representing the patterns.
The initial size of the neighbourhood mentioned above and the fixed values of excitatory (positive) and inhibitory (negative) weights to neurons in the neighbourhood are among the design decisions to be made.
TRAINING THE KOHONEN NETWORK
1. Initialise weights Initialise weights from N inputs to the nodes to small
random values. Set the initial radius of the neighbourhood.
49
2. Present new input x0(t), x1(t), x2(t) ... xn–1(t), where xi(t) is the input to node i at time t.
3. Compute distances to all nodes. Compute distances dj between the input and each output
node j using
dj = (xi(t) – wij(t))2,
where xi(t) is the input to node i at time t and wij(t) is the weight from input node i to output node j at time t.
4. Select output node with minimum distance. Select output node j* as the output node with minimum dj. 5. Update weights to node j* and neighbours. Weights updated for node j* and all nodes in the
neighbourhood defined by Nj*. New weights are
wij(t)(t + 1) = wij(t) + (t)(xi(t) – wij(t)),
for j in Nj*, 0 i N–1.
The term (t) is a gain term 0 1. Both and Nj* decrease with time.
6. Repeat by going to step 2.
EXAMPLE: DATA CLUSTERING
clear all num=300; % amount of points to categorization num_n=3 % num_n * num_n –– amount of neurons (clusters) p = –5:5; for i=1:11:num kx=rand(1,11)*0.2–0.1; ky=rand(1,11)*0.2–0.1; x(i+5+p)=rand+kx;
50
y(i+5+p)=rand+ky; end; for j1=1:num_n for j2=1:num_n w1(j1,j2)=rand*0.05+0.48; w2(j1,j2)=rand*0.05+0.48; end end figure(1) axis([0 1 0 1]) plot(x,y,'.r') hold on plot(w1,w2,'ob') plot(w1,w2,'k','linewidth',2) plot(w1',w2','k','linewidth',2) hold off drawnow no=1; do=5; T=300; t=1; while (t<=T) n=no*(1–t/T); d=round(do*(1–t/T)); %loop for the 1000 inputs for i=1:num e_norm=(x(i)–w1).^2+(y(i)–w2).^2; minj1=1; minj2=1; min_norm=e_norm(minj1,minj2); for j1=1:num_n for j2=1:num_n if e_norm(j1,j2)<min_norm min_norm=e_norm(j1,j2); minj1=j1;
51
minj2=j2; end end end j1star= minj1; j2star= minj2; %update the winning neuron w1(j1star,j2star)=w1(j1star,j2star)+n*(x(i)–
w1(j1star,j2star)); w2(j1star,j2star)=w2(j1star,j2star)+n*(y(i)– w2(j1star,j2star)); %update the neighbour neurons for dd=1:1:d jj1=j1star–dd; jj2=j2star; if (jj1>=1) w1(jj1,jj2)=w1(jj1,jj2)+n*(x(i)–w1(jj1,jj2)); w2(jj1,jj2)=w2(jj1,jj2)+n*(y(i)–w2(jj1,jj2)); end jj1=j1star+dd; jj2=j2star; if (jj1<=num_n) w1(jj1,jj2)=w1(jj1,jj2)+n*(x(i)–w1(jj1,jj2)); w2(jj1,jj2)=w2(jj1,jj2)+n*(y(i)–w2(jj1,jj2)); end jj1=j1star; jj2=j2star–dd; if (jj2>=1) w1(jj1,jj2)=w1(jj1,jj2)+n*(x(i)–w1(jj1,jj2)); w2(jj1,jj2)=w2(jj1,jj2)+n*(y(i)–w2(jj1,jj2)); end jj1=j1star; jj2=j2star+dd; if (jj2<=num_n) w1(jj1,jj2)=w1(jj1,jj2)+n*(x(i)–w1(jj1,jj2));
52
w2(jj1,jj2)=w2(jj1,jj2)+n*(y(i)–w2(jj1,jj2)); end end end t=t+1; figure(1) plot(x,y,'.r') hold on plot(w1,w2,'ob') plot(w1,w2,'k','linewidth',2) plot(w1',w2','k','linewidth',2) hold off title(['t=' num2str(t)]); drawnow end
Figure 8 – The results of clustering
53
REFERENCES
1. Haykin Simon. Neural Networks: A Comprehensive Foundation / Simon S. Haykin. – USA : Macmillan, 1994. – 696 p.
2. Fausett Laurene. Fundamentals of Neural Networks: Architectures, Algorithms, and Applications / Laurene Fausett. – USA : Prentice-Hall, 1994. – 461 p.
3. Hassoun Mohamad. Fundamentals of Artificial Neural Networks / Mohamad H. Hassoun. – USA : MIT Press, 1995. – 511 p.
4. McMahon David. MATLAB Demystified / David McMahon. – USA : The McGraw-Hill Companies, 2007. – 326 p.
5. Hunt Brian. A Guide to MATLAB for Beginners and Experienced Users / Brian R. Hunt, Ronald L. Lipsman, Jonathan M. Rosenberg. – USA : Cambridge University Press, New York, 2001. – 327 p.
6. Jin Yu. Artificial Neural Network / Yu Jin [Електронний ресурс]. – Режим доступу : https://users.cecs.anu.edu.au/~jinyu/.
7. Primoz Potocnik. Neural Networks course / Potocnik Primoz [Електронний ресурс]. – Режим доступу : https://www.neural.si.
Навчальне видання
МЕТОДИЧНІ ВКАЗІВКИ
до практичних робіт
із дисципліни «Моделювання нейронних мереж»
для студентів спеціальності “Прикладна математика”
(Англійською мовою)
Відповідальний за випуск О. В. Лисенко
Редактор Л. В. Штихно
Комп’ютерне верстання І. О. Князя
Формат 60х84/16. Ум. друк. арк. 3,26. Обл.-вид. арк. 3,38.
Видавець і виготовлювач
Сумський державний університет,
вул. Римського-Корсакова, 2, м. Суми, 40007
Свідоцтво суб’єкта видавничої справи ДК № 3062 від 17.12.2007.