project report -vaibhav

37
A Project Report On SIMULATING A FEED FORWARD ARTIFICIAL NEURAL NETWORK IN C++ Submitted in partial fulfilment of the requirements For the award of degree Of INTEGRATED DUAL DEGREE In COMPUTER SCIENCE AND ENGINEERING (With Specialization in Information Technology) Submitted by Vaibhav Dhattarwal CSE-IDD Enrolment No: 08211018 Under the guidance of DR. DURGA TOSHINWAL Professor ELECTRONICS AND COMPUTER ENGINEERING DEPARTMENT

Upload: vaibhav-dhattarwal

Post on 14-Apr-2017

240 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Project Report -Vaibhav

A Project Report On

SIMULATING A FEED FORWARD ARTIFICIAL NEURAL

NETWORK IN C++

Submitted in partial fulfilment of the requirements

For the award of degree

Of

INTEGRATED DUAL DEGREE

In

COMPUTER SCIENCE AND ENGINEERING

(With Specialization in Information Technology)

Submitted by

Vaibhav Dhattarwal

CSE-IDD

Enrolment No: 08211018

Under the guidance of

DR. DURGA TOSHINWAL

Professor

ELECTRONICS AND COMPUTER ENGINEERING DEPARTMENT

INDIAN INSTITUTE OF TECHNOLOGY ROORKEE

ROORKEE-247667

OCTOBER 2012

Page 2: Project Report -Vaibhav

Abstract

This report presents an overview of how a feed forward artificial neural network was

implemented in C++. An Artificial neural network is a system composed of many simple

processing elements operating in parallel whose function is determined by network structure,

connection strengths, and the processing performed at computing elements or nodes. A neural

network is a massively parallel distributed processor that has a natural inclination for storing

experiential knowledge and making it available for use. This report also provides a brief

overview of artificial neural networks and questions their practical applicability. This is

followed by a detailed explanation of the design and implementation of a three-layer feed

forward neural network using back propagation algorithm.

Page 3: Project Report -Vaibhav

Table of Contents

Page

Abstract i

Table of Contents ii

List of Figures iii

Chapter 1 Introduction 1

1.1 Objective of Project 2

Chapter 2 Artificial Neural Network 3

2.1 Neural Network Definition 3

2.2 Neural Network Applications 5

2.3 Neural Network Categorization 6

2.4 Types of Neural Network 8

Chapter 3 Design 10

3.1 Back Propagation Algorithm 10

3.2 Pseudo Code for One Layer 11

3.3 Pseudo Code for all the layers 13

Chapter 4 Implementation 15

4.1 Pseudo Code for training patterns 15

4.2 Pseudo Code for minimizing error 16

Chapter 5 Results 19

References 20

Page 4: Project Report -Vaibhav

List of Figures

Figure Title Page

2.1 an Artificial Neural Network 3

2.2 the sigmoid curve 6

3.1 the design for calculating output activation 8

3.2 Output Screenshot 9

Page 5: Project Report -Vaibhav

1 Introduction

An Artificial Neural Network (ANN), usually called neural network (NN), is a mathematical

model or computational model that is inspired by the structure and functional aspects of

biological neural networks. A neural network consists of an interconnected group of artificial

neurons, and it processes information using a connection based approach to computation. In

most cases an ANN is an adaptive system that changes its structure based on external or

internal information that flows through the network during the learning phase. Modern neural

networks are non-linear statistical data modelling tools. They are usually used to model

complex relationships between inputs and outputs or to find patterns in data.

Neural network is a set of connected input/output units and each connection has a weight

present with it. During the learning phase, network learns by adjusting weights so as to

predict the correct class labels of the input tuples. Neural networks have the remarkable

ability to derive meaning from complicated or imprecise data and can be used to extract

patterns and detect trends that are too complex to be noticed by either humans or other

computer techniques. These are well suited for continuous valued inputs and outputs. Neural

networks are best at identifying patterns or trends in data and well suited for prediction or

forecasting needs.

Neural networks are non-linear statistical data modelling tools. They can be used to model

complex relationships between inputs and outputs; or to find patterns in data and to infer rules

from them. Neural networks are useful in providing information on associations,

classifications, clusters, and forecasting. Using neural networks as a tool, data warehousing

firms can harvest information from datasets in the data mining process. Neural networks are

programmed to store, recognize, and associatively retrieve patterns or database entries; to

solve combinatorial optimization problems; to filter noise from measurement data; to control

ill-defined problems; in summary, to estimate sampled functions when we do not know the

form of the functions. The two abilities: pattern recognition and function estimation make

Page 6: Project Report -Vaibhav

neural networks a very prevalent utility in data mining. With their model-free estimators and

their dual nature, neural networks serve data mining in a variety of ways.

Neural networks, depending on the architecture, provide associations, classifications, clusters,

prediction and forecasting to the data mining industry. Neural networks essentially comprise

three pieces: the architecture or model; the learning algorithm; and the activation functions.

Due to neural networks, we can mine valuable information from a mass of history information

so that it can be efficiently used in financial areas. Hence, the applications of neural networks

in financial forecasting have become very popular.

1.1 Objective of the Project

The introduction of Artificial Neural Networks and a description of the Neural Networks are

presented in this project report. The objective of this project is to implement a Feed Forward

Artificial Neural Network in C++ using the back propagation algorithm. The design of this

simulation has also been discussed followed by an explanation of the implementation of the

Network. The results of the output program will also be included in this project.

Page 7: Project Report -Vaibhav

2 Artificial Neural Network

Figure 2.1 an Artificial Neural Network

First of all, when we are talking about a neural network, we should more properly say

"artificial neural network" (ANN), because that is what we mean most of the time in this

project. Biological neural networks are much more complicated than the mathematical

models we use for ANNs. But it is customary to be lazy and drop the "A" or the "artificial".

2.1 Neural Network Definition

There is no universally accepted definition of an NN. But perhaps most people in the field

would agree that an NN is a network of many simple processors ("units"), each possibly

having a small amount of local memory. The units are connected by communication channels

("connections") which usually carry numeric data, encoded by any of various means. The

units operate only on their local data and on the inputs they receive via the connections. The

restriction to local operations is often relaxed during training.

Page 8: Project Report -Vaibhav

Some NNs are models of biological neural networks and some are not, but historically, much

of the inspiration for the field of NNs came from the desire to produce artificial systems

capable of sophisticated, perhaps "intelligent", computations similar to those that the human

brain routinely performs, and thereby possibly to enhance our understanding of the human

brain.

Most NNs have some sort of "training" rule whereby the weights of connections are adjusted

on the basis of data. In other words, NNs "learn" from examples, as children learn to

distinguish dogs from cats based on examples of dogs and cats. If trained carefully, NNs may

exhibit some capability for generalization beyond the training data, that is, to produce

approximately correct results for new cases that were not used for training.

NNs normally have great potential for parallelism, since the computations of the components

are largely independent of each other. Some people regard massive parallelism and high

connectivity to be defining characteristics of NNs, but such requirements rule out various

simple models, such as simple linear regression (a minimal feed forward net with only two

units plus bias), which are usefully regarded as special cases of NNs.

Some popular descriptive definitions of Neural Networks

A neural network is a system composed of many simple processing elements

operating in parallel whose function is determined by network structure, connection

strengths, and the processing performed at computing elements or nodes. A neural

network is a massively parallel distributed processor that has a natural propensity for

storing experiential knowledge and making it available for use. It resembles the brain

in two respects:

1. Knowledge is acquired by the network through a learning process.

2. Interneuron connection strengths known as synaptic weights are used to store the

knowledge.

A neural network is a circuit composed of a very large number of simple processing

elements that are neural based. Each element operates only on local information.

Furthermore each element operates asynchronously; thus there is no overall system

clock.

Page 9: Project Report -Vaibhav

Artificial neural systems, or neural networks, are physical cellular systems which can

acquire, store, and utilize experiential knowledge.

2.2 Neural Network Applications

Practical applications of NNs most often employ supervised learning. For supervised

learning, you must provide training data that includes both the input and the desired result

(the target value). After successful training, you can present input data alone to the NN (that

is, input data without the desired result), and the NN will compute an output value that

approximates the desired result. However, for training to be successful, you may need lots of

training data and lots of computer time to do the training. In many applications, such as

image and text processing, you will have to do a lot of work to select appropriate input data

and to code the data as numeric values.

In practice, NNs are especially useful for classification and function approximation/mapping

problems which are tolerant of some imprecision, which have lots of training data available,

but to which hard and fast rules (such as those that might be used in an expert system) cannot

easily be applied. Almost any finite-dimensional vector function on a compact set can be

approximated to arbitrary precision by feed forward NNs (which are the type most often used

in practical applications) if you have enough data and enough computing resources.

In principle, NNs can compute any computable function, i.e., they can do everything a

normal digital computer can do, or perhaps even more, under some assumptions of doubtful

practicality.

Neural Networks are interesting for quite a lot of very different people:

Computer scientists want to find out about the properties of non-symbolic information

processing with neural nets and about learning systems in general.

Statisticians use neural nets as flexible, nonlinear regression and classification

models.

Engineers of many kinds exploit the capabilities of neural networks in many areas,

such as signal processing and automatic control.

Page 10: Project Report -Vaibhav

Cognitive scientists view neural networks as a possible apparatus to describe models

of thinking and consciousness (High-level brain function).

Neurophysiologists use neural networks to describe and explore medium-level brain

function (e.g. memory, sensory system, and motorics).

Physicists use neural networks to model phenomena in statistical mechanics and for a

lot of other tasks.

Biologists use Neural Networks to interpret nucleotide sequences.

Philosophers and some other people may also be interested in Neural Networks for

various reasons.

2.3 Neural Network Categorization

There are many kinds of NNs by now. Nobody knows exactly how many. New ones (or at

least variations of old ones) are invented every week. Below is a collection of some of the

most well known methods:

The two main kinds of learning algorithms are supervised and unsupervised.

In supervised learning, the correct results (target values, desired outputs) are known

and are given to the NN during training so that the NN can adjust its weights to try

matching its outputs to the target values. After training, the NN is tested by giving it

only input values, not target values, and seeing how close it comes to outputting the

correct target values.

In unsupervised learning, the NN is not provided with the correct results during

training. Unsupervised NNs usually perform some kind of data compression, such as

dimensionality reduction or clustering.

The distinction between supervised and unsupervised methods is not always clear-cut. An

unsupervised method can learn a summary of a probability distribution, then that summarized

distribution can be used to make predictions. Furthermore, supervised methods come in two

sub varieties: auto-associative and hetero-associative. In auto-associative learning, the target

values are the same as the inputs, whereas in hetero-associative learning, the targets are

generally different from the inputs. Many unsupervised methods are equivalent to auto-

associative supervised methods.

Page 11: Project Report -Vaibhav

Two major kinds of network topology are feed forward and feedback.

In a feed forward NN, the connections between units do not form cycles. Feed

forward NNs usually produce a response to an input quickly. Most Feed forward NNs

can be trained using a wide variety of efficient conventional numerical methods in

addition to algorithms invented by NN researchers.

In a feedback or recurrent NN, there are cycles in the connections. In some

feedback NNs, each time an input is presented, the NN must iterate for a potentially

long time before it produces a response. Feedback NNs are usually more difficult to

train than Feed forward NNs.

Some kinds of NNs can be implemented as either Feed forward or feedback networks.

NNs also differ in the kinds of data they accept. Two major kinds of data are categorical and

quantitative.

Categorical variables take only a finite (technically, countable) number of possible

values, and there are usually several or more cases falling into each category.

Categorical variables may have symbolic values (e.g., "male" and "female", or "red",

"green" and "blue") that must be encoded into numbers before being given to the

network. Both supervised learning with categorical target values and unsupervised

learning with categorical outputs are called "classification."

Quantitative variables are numerical measurements of some attribute, such as length

in meters. The measurements must be made in such a way that at least some

arithmetic relations among the measurements reflect analogous relations among the

attributes of the objects that are measured. Supervised learning with quantitative

target values is called "regression."

Some variables can be treated as either categorical or quantitative, such as number of children

or any binary variable. Most regression algorithms can also be used for supervised

classification by encoding categorical target values as 0/1 binary variables and using those

binary variables as target values for the regression algorithm. The outputs of the network are

posterior probabilities when any of the most common training methods are used.

Page 12: Project Report -Vaibhav

2.4 Types of Neural Network

Here are some well-known kinds of Neural Networks:

A. Supervised

1. Feed forward

Linear

Hebbian

Perceptron

Adaline

Higher Order

Functional Link

MLP: Multilayer perceptron

Backprop

Cascade Correlation

Quickprop

RPROP

RBF networks

OLS: Orthogonal Least Squares

CMAC: Cerebellar Model Articulation Controller

Classification only

LVQ: Learning Vector Quantization

PNN: Probabilistic Neural Network

Regression only

GNN: General Regression Neural Network

2. Feedback

BAM: Bidirectional Associative Memory

Boltzman Machine

Recurrent time series

Back propagation through time

Elman

FIR: Finite Impulse Response

Jordan

Real-time recurrent network

Page 13: Project Report -Vaibhav

Recurrent back propagation

TDNN: Time Delay NN

3. Competitive

ARTMAP

Fuzzy ARTMAP

Gaussian ARTMAP

Counter propagation

Neocognitron

B. Unsupervised

1. Competitive

Vector Quantization

Grossberg

Kohonen

Conscience

Self-Organizing Map

Kohonen

GTM:

Local Linear

Adaptive resonance theory

ART 1

ART 2

ART 2-A

ART 3

Fuzzy ART

DCL: Differential Competitive Learning

2. Dimension Reduction

Hebbian

Oja

Sanger

Differential Hebbian

3. Auto association

Linear autoassociator

BSB: Brain State in a Box

Hopfield

Page 14: Project Report -Vaibhav

3 Design

The simplified process for training a Feed Forward Neural Network is as follows:

1. Input data is presented to the network and propagated through the network until it

reaches the output layer. This forward process produces a predicted output.

2. The predicted output is subtracted from the actual output and an error value for the

networks is calculated.

3. The neural network then uses supervised learning, which in most cases is back

propagation, to train the network. Back propagation is a learning algorithm for

adjusting the weights. It starts with the weights between the output layer PE’s and

the last hidden layer PE’s and works backwards through the network.

4. Once back propagation has finished, the forward process starts again, and this cycle

is continued until the error between predicted and actual outputs is minimized.

3.1. The Back Propagation Algorithm:

Back propagation, or propagation of error, is a common method of teaching artificial neural

networks how to perform a given task. Back propagation is the method of training artificial

neural networks so as to minimize the objective function. The back propagation algorithm

performs learning on a feed-forward neural network. The back propagation algorithm is used

in layered feed forward ANNs. This means that the artificial neurons are organized in layers,

and send their signals “forward”, and then the errors are propagated backwards. The back

propagation algorithm uses supervised learning, which means that we provide the algorithm

with examples of the inputs and outputs we want the network to compute, and then the error

(difference between actual and expected results) is calculated. The idea of the back

propagation algorithm is to reduce this error, until the ANN learns the training data.

Algorithm for a 3-layer network:

1. Initialize the weights in the network

2. Do

a. For each example E in the training set

oNeural-net-output (network, E); forward pass

Page 15: Project Report -Vaibhav

oT = teacher output for E

oCalculate error (T - O) at the output units

oCompute ΔWho for all weights from hidden layer to output layer;

oBackward pass

oCompute ΔWih for all weights from input layer to hidden layer;

oBackward pass continued

oUpdate the weights in the network

3. Until all examples classified correctly or stopping criterion satisfied

4. Return the network

The Back Propagation learning algorithm can be divided into two phases:

Phase 1: Propagation

This phase involves the following steps:

1. Forward propagation of a training pattern's input through the neural network.

2. Backward propagation of the propagation's output activations through the neural

network using the training pattern's target.

Phase 2: Weight update

For each weight-synapse the following steps are used:

1. Multiply its output delta and input activation to get the gradient of the weight.

2. Bring the weight in the opposite direction of the gradient by subtracting a ratio of it

from the weight.

Repeat phase 1 and 2 until the performance of the network is satisfactory.

3.2 Pseudo Code for one Layer

A single neuron (i.e. processing element) takes in total input PEinput and produces output

activation PEout. In this project, we are taking the activation function as Sigmoid Function.

Hence we can consider the out PEout=Sigmoid(PEinput). Sigmoid function refers to the special

case of the logistic function shown below and defined by the formula

Page 16: Project Report -Vaibhav

Figure 3.1 the sigmoid curve

Though other activation functions are often used (e.g. linear or hyperbolic tangent). This has

the effect of squashing the infinite range of PEinput into the range 0 to 1. It also has the

convenient property that its derivative takes the particularly simple form

dSdt

=S∗(1−S)

Typically, the input PEinput into a given neuron will be the weighted sum of output activations

feeding in from a number of other neurons. It is convenient to think of the activations flowing

through layers of neurons. So, if there are NumUnitLayer1 neurons in layer 1, the total

activation flowing into our layer 2 neuron is the sum over the product OutputLayer1[i]*Wt[i],

where Wt[i] is the strength/weight of the connection between PE[i] in layer 1 and our PE in

layer 2. Each neuron will also have a bias, or resting state, that is added to the sum of inputs,

and it is convenient to call this Wt[0]. We can then write

InputLayer2 = Wt[0]           // consider the resting state bias weight //

for( i = 1 | i < = NumUnitLayer1 | i++ ) // setting loop condition //

Page 17: Project Report -Vaibhav

{        

Add to InputLayer2 the sum over the product OutputLayer1[i] * Wt[i]

}

Compute the sigmoid Out put Layer 2= 11+e− Input Layer 2 to get activation output

Similarly layer 2 will have many processing elements as well, so it is appropriate to write the

weights between PE[i] in layer 1 and PE[j] in layer 2 as a two dimensional array Wt[i][j].

Thus to get the output of PE[j] in layer 2 we have

InputLayer2[j] = Wt[0][j]          

For ( i = 1 | i < = NumUnitLayer1 | i++ )

{        

Add to InputLayer2[j] the sum over the product OutputLayer1[i] * Wt[i][j]

}

Compute the sigmoid Out put Layer 2[ j ]= 11+e−Input Layer 2 [ j ] to get activation output

Now we know that Layer 2 has number of processing units given by NumUnitLayer2 and the

above code calculates the output for only one processing element PE[j]. However we require

the output for all the processing elements in Layer 2. Hence we introduce another loop to get

all the layer 2 outputs

For ( j = 1 | j < = NumUnitLayer2 | j++ )

{

InputLayer2[j] = Wt[0][j]          

For ( i = 1 | i < = NumUnitLayer1 | i++ )

{        

Add to InputLayer2[j] the sum over the product OutputLayer1[i] * Wt[i][j]

}

Compute sigmoid Out put Layer 2[ j ]= 11+e−Input Layer 2 [ j ] for output

}

3.3 Pseudo Code for all Layers

Page 18: Project Report -Vaibhav

Now that we have calculated the output for all the processing elements in one layer, we can

look at writing the code which calculates the output for all the layers in our network. Three

layer networks are necessary and sufficient for most purposes, so our layer 2 outputs feed into

a third layer in the same way as the above cases. The feed forward neural network chosen for

this project has three layers 1, 2, 3 and here is the calculation of output for all three layers of

the network

For ( j = 1 | j < = NumUnitLayer2 | j++ ) // computes Layer 2 outputs //

{

InputLayer2[j] = WtLayer1/Layer2[0][j]          

For ( i = 1 | i < = NumUnitLayer1 | i++ )

{        

Add to InputLayer2[j] the sum over OutputLayer1[i] * WtLayer1/Layer2 [i][j]

}

Compute sigmoid Out put Layer 2[ j ]= 11+e−Input Layer 2 [ j ] for output

}

For ( k = 1 | k < = NumUnitLayer3 | k++ ) // computes Layer 3 outputs //

{

InputLayer3[k] = WtLayer2/Layer3[0][k]          

For ( j = 1 | j < = NumUnitLayer2 | j++ )

{        

Add to InputLayer3[k] the sum over OutputLayer2[j] * WtLayer2/Layer3 [j][k]

}

Compute sigmoid Out put Layer 3 [k ]= 11+e−Input Layer3 [ k ] for output

}

To avoid confusion in the pseudo code there is a different index for each layer: i, j, k

for Layers 1, 2, 3 respectively. Weights for connections are also different for

distinguishing between the different layers, WtLayer1/Layer2 and WtLayer2/Layer3. For obvious

reasons, for three layer networks, it is traditional to call layer 1 the Input layer, layer 2

the Hidden layer, and layer 3 the Output layer. The neural network in this project has

a design similar to the figure shown below.

Page 19: Project Report -Vaibhav

Figure 3.2 the design for calculating output activation

Now we can denote the layers 1, 2, 3 as input layer, hidden layer, and output layer

respectively. The weights for the connections have also been denoted appropriately. As

shown in the above figure, the initial bias weights are also included in the input for each layer

and consequently the output also.

For ( j = 1 | j < = NumUnitHidden | j++ ) // computes Hidden Layer PE outputs //

{

InputHidden[j] = WtInput/Hidden[0][j]          

For ( i = 1 | i < = NumUnitInput | i++ )

{        

Add to InputHidden[j] the sum over OutputInput[i] * WtInput/Hidden [i][j]

}

Compute sigmoid Out put Hidden [ j ]= 11+e−Input Hidden [ j] for output

}

For ( k = 1 | k < = NumUnitOuput | k++ ) // computes Output Layer PE outputs //

{

InputOutput[k] = WtHidden/Output[0][k]          

For ( j = 1 | j < = NumUnitHidden | j++ )

{        

Add to InputOutput [k] sum over OutputHidden[j] * WtHidden/Output [j][k]

}

Compute sigmoid Out put [k ]= 11+e− Input Output [k ] for output

Page 20: Project Report -Vaibhav

}

4 Implementation

4.1 Pseudo Code for training patterns

In this project, there will be a whole set of training patterns(NumExamples), i.e. pairs of input and

target output vectors,

Input[E][i] , Target[E][k]

labelled by the index E. The network learns by minimizing some measure of the error of the

network's actual outputs compared with the target outputs. The sum squared error for all the

output units, denoted by k and all training patterns, denoted by E will be given by

Error = 0.0 ;

For ( E= 1 | E < = NumUnitHidden | E++ )

{

For ( k = 1 | k < = NumUnitOuput | k++ )

{

Add to Error the sum over the product 0.5 * (Target[E][k] - Output[E]

[k]) * (Target[E][k] - Output[E][k]) ;

}

}

The factor of 0.5 is conventionally included to simplify the algebra in deriving the learning

algorithm. If we insert the above code for computing the network outputs into the E loop of

this, we end up with

Error = 0.0 ;

For ( E= 1 | E < = NumUnitHidden | E++ )

{          // computes for all training patterns(E) //

For ( j = 1 | j < = NumUnitHidden | j++ )

{

InputHidden[E][j] = WtInput/Hidden[0][j]          

For ( i = 1 | i < = NumUnitInput | i++ )

Page 21: Project Report -Vaibhav

{        

Add to InputHidden[E] [j] the sum over OutputInput[E] [i] *

WtInput/Hidden [i][j]

}

Compute sigmoid Out put Hidden [E ][ j ]= 11+e− Input Hidden [E ][ j] for

output

}

For ( k = 1 | k < = NumUnitOuput | k++ )

{

InputOutput[E] [k] = WtHidden/Output[0][k]          

For ( j = 1 | j < = NumUnitHidden | j++ )

{        

Add to InputOutput [E] [k] sum over OutputHidden[E] [j] *

WtHidden/Output [j][k]

}

Compute sigmoid Out put [ E][k ]= 11+e−Input Output [E] [k ] for output

Add to Error the sum over the product 0.5 * (Target[E][k] - Output[E]

[k]) * (Target[E][k] - Output[E][k])

}

}

4.2 Pseudo Code for minimizing error

The next stage of the project involves iteratively adjusting the weights to minimize the network's error. The method adopted in this project is by 'gradient descent' on the error function. We can compute how much the error is changed by a small change in each weight (i.e. compute the partial derivatives dError/dWt) and shift the weights by a small amount in the direction that reduces the error. As stated before, we use the back-propagation algorithm. After the calculation of the above sum squared error, we can compute and apply one iteration (or 'epoch') of the required weight changes ΔWho and ΔWih using

Error = 0.0 ;

For ( E= 1 | E < = NumUnitHidden | E++ )

{          // computes for all training patterns(E) //

For ( j = 1 | j < = NumUnitHidden | j++ )

Page 22: Project Report -Vaibhav

{

InputHidden[E][j] = WtInput/Hidden[0][j]          

For ( i = 1 | i < = NumUnitInput | i++ )

{        

Add to InputHidden[E] [j] the sum over OutputInput[E] [i] *

WtInput/Hidden [i][j]

}

Compute sigmoid Out put Hidden [E ][ j ]= 11+e− Input Hidden [E ][ j] for

output

}

For ( k = 1 | k < = NumUnitOuput | k++ )

{

InputOutput[E] [k] = WtHidden/Output[0][k]          

For ( j = 1 | j < = NumUnitHidden | j++ )

{        

Add to InputOutput [E] [k] sum over OutputHidden[E] [j] *

WtHidden/Output [j][k]

}

Compute sigmoid Out put [ E][k ]= 11+e−Input Output [E] [k ] for output

Add to Error the sum over the product 0.5 * (Target[E][k] - Output[E]

[k]) * (Target[E][k] - Output[E][k]) ;

ΔOutput[k] = (Target[E][k] - Output[E][k]) * Output[E][k] * (1 -

Output[E][k]) // derivative of the function //

}

For ( j = 1 | j < = NumUnitHidden | j++ )

{          // Back Propagation of error to hidden layer //

Sum of ΔOutput [j] = 0.0

For ( k = 1 | k < = NumUnitOuput | k++ )

{

Add to Sum of ΔOutput [j] the sum over the product

WtHidden/Output [j][k] * ΔOutput [k] ;

Page 23: Project Report -Vaibhav

}

ΔH[j] = Sum of ΔOutput [j] * OutputHidden [E][j] * (1.0 - OutputHidden [E]

[j]) // derivative of the function //

}

For ( j = 1 | j < = NumUnitHidden | j++ )

{          // This loop updates the weight input to hidden //

Add to ΔWih [0][j] the sum of: product β * ΔH [j] to the product: α *

ΔWih [0][j]

Add to WtInput/Hidden [0][j] the change ΔWih [0][j]

For ( i = 1 | i < = NumUnitInput | i++ )

{

Add to ΔWih [i][j] the sum of product β * InputHidden [p][i] * ΔH

[j] to the product: α * ΔWih [i][j]

Add to WtInput/Hidden [i][j] the change ΔWih [i][j]

}

}

For ( k = 1 | k < = NumUnitOuput | k++ )

{ // This loop updates the weight hidden to output //

Add to ΔWho [0][k] the sum of: product β * ΔOutput[k] to the product:

α * ΔWho [0][k]

Add to WtHidden/Output [0][k] the change ΔWho [0][k]

For ( j = 1 | j < = NumUnitHidden | j++ )

{

Add to ΔWho [j][k] the sum of product β * OutputHidden [p][j] *

ΔOutput [k] to the product: α *ΔWho [j][k]

Add to WtHidden/Output [j][k] the change ΔWho [j][k]

}

}

}

The weight changes ΔWih and ΔWho are each made up of two components. First,

the beta component that is the gradient descent contribution. Second, the alpha component is

a 'momentum' term which effectively keeps a moving average of the gradient descent weight

change contributions, and thus smoothes out the overall weight changes.

Page 24: Project Report -Vaibhav

The complete training process will consist of repeating the above weight updates for a

number of epochs until some error criterion is met.

5 Results

Figure 5.1 Output Screenshot

The program based on the design discussed in the previous section was executed

successfully. The pseudo code was successfully implemented and the three layered feed

forward neural network was simulated on the basis of the back propagation algorithm.

Page 25: Project Report -Vaibhav

6 References

[1] Pinkus, A. (1999), "Approximation theory of the MLP model in neural networks," Acta Numerica, 8, 143-196.

[2] Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan.

[3] Nigrin, A. (1993), Neural Networks for Pattern Recognition, Cambridge, MA: The MIT Press.

[4] Zurada, J.M. (1992), Introduction To Artificial Neural Systems, Boston: PWS Publishing Company.

[5] Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press.

[6] Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization and Signal Processing. NY: John Wiley & Sons, ISBN 0-471-93010-5.

[7] Diamantaras, K.I., and Kung, S.Y. (1996) Principal Component Neural Networks: Theory and Applications, NY: Wiley.

[8] Fausett, L. (1994), Fundamentals of Neural Networks, Englewood Cliffs, NJ: Prentice Hall.

[9] Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs, N.J.: Prentice-Hall.

[10] Masters, T. (1993). Practical Neural Network Recipes in C++, San Diego: Academic Press.

[11] Masters, T. (1995) Advanced Algorithms for Neural Networks: A C++ Sourcebook, NY: John Wiley and Sons, ISBN 0-471-10588-0

[12] Oja, E. (1989), "Neural networks, principal components, and subspaces," International Journal of Neural Systems, 1, 61-68.

[13] Pao, Y. H. (1989), Adaptive Pattern Recognition and Neural Networks, Reading, MA: Addison-Wesley Publishing Company, ISBN 0-201-12584-6.

Page 26: Project Report -Vaibhav

[14] Reed, R.D., and Marks, R.J, II (1999), Neural Smithing: Supervised Learning in Feed forward Artificial Neural Networks, Cambridge, MA: The MIT Press, ISBN 0-262-18190-8.

[15] Sanger, T.D. (1989), "Optimal unsupervised learning in a single-layer linear Feed forward neural network," Neural Networks, 2, 459-473.