machine_learning_ann_ramin_shamshiri.pdf

7/27/2019 Machine_Learning_ANN_Ramin_Shamshiri.pdf

1/21

Machine Learning and eural etworkRamin Shamshiri

Graduate Student of Biosystem Eng,

University of Florida, 30-Oct-2007

Introduction [2]

There are situation in which there is no known method for computing the desired output from a set ofinputs. An alternative strategy for solving this type of problem is for the computer to attempt to learn theinput/output functionality from examples. The approach of using examples to synthesis program is known asthe learning methodology, and in the particular cases when the examples are input/output it is called supervises

learning. The examples of input/output functionality are referred as the training data.

Some Terms and Definitions [2]

Target function: When an underlying function from inputs to outputs exists, it is referred to as the targetfunction.

Solution: The estimate of the target function which is learnt or output by the learning algorithm is known as thesolution of the learning problem.

Decision Function: In the case of classification this Target function is sometimes referred to as the decisionfunction.

Binary classification: A learning problem with binary outputs is referred to as a binary-classification problem.

Regression: For real valued outputs the problem becomes known as regression.

Unsupervised Learning: Consider the case where there are no output values and the learning task is to gainsome understanding of the process that generates the data. This type of learning includes density estimation,

learning the support of a distribution, clustering and

Query Learning: Is a model of learning which considers more complex interaction between a learner s and

their environment. Query Learning is a case when the learner is allowed to query the environment about theoutputs associated with a particular input. The study of how this affects the learners ability to learn different

task is known as a query.

Reinforcement Learning: Is a further complexity of interaction where the learner has a range of actions at theirdisposed which they can take to attempt to move towards states where they can expect high rewards.

Batch Learning and on-line learning: In batch learning all the data are given to the learner at the start oflearning, but in on-line learning the learner receives once example at a time and gives their estimate of the

output, before receiving the correct value. In on-line learning they update their current hypothesis in responseto each new example and the quality of learning is assessed by the total number of mistakes made duringlearning.


2/21

Machine LearningMachine Learning is a subfield of Artificial Intelligence which deals with algorithm development and

programming techniques that allow computers to learn. The major focus of machine learning researchis to extract information from data automatically, by computational and statistical methods. Hence,

machine learning is closely related to data mining, statistics and theoretical computer science.

Two types of learning are currently known: Inductive and Deductive.

Induction or inductive reasoning, sometimes called inductive logic: is a reasoning process inwhich premises of an argument are believed to support the conclusion but do not ensure it.Inductive learning is used to assign properties or relations to types based on tokens or formulate

laws based on limited observations of recurring phenomenal patterns. Inductive machine learning

methods extract rules and patterns out of massive data sets. Induction is employed, for example,in using specific propositions such as:

This ice is cold ==> All ice is cold.

A billiard ball moves when struck with a cue ==> All billiard balls struck with a cue move.

Deductive reasoning, according Oxford, Cambridge and Merriam Dictionary, is the type ofreasoning that proceeds from general principles or premises to derive particular information.

Deductive reasoning is dependent on its premises. That is, a false premise can possibly lead to a

false result, and inconclusive premises will also yield an inconclusive conclusion. [1] Forexample:

All apples are fruit.

All fruits grow on trees ==> Therefore all apples grow on trees.Or

All apples are fruit.

Some apples are red ==> Therefore some fruit is red.

Both types of reasoning are routinely employed. One difference between them is that in deductive

reasoning, the evidence provided must be a set about which everything is known before theconclusion can be drawn. Since it is difficult to know everything before drawing a conclusion,

deductive reasoning has little use in the real world. This is where inductive reasoning steps in. Given

a set of evidence, however incomplete the knowledge is, the conclusion is likely to follow, but one

gives up the guarantee that the conclusion follows. However it does provide the ability to learn newthings that are not obvious from the evidence.

Learning and Generalization [2]

Early machine learning algorithm aimed to learn representations of simple symbolic functions that

could be understood and verified by experts. Therefore the goal of learning in this paradigm was to output ahypothesis that performed the correct classification of the training data and early learning algorithm weredesigned to find such an accurate fit to the data. Such a hypothesis is said to be consistent.


3/21

Some of the specific application of Machine Learning includes:

Natural Language Processin Pattern Recognition Search Engines Medical Diagnosis Bioinformatic Cheminformatics Credit card and fraud detection Stock Market analysis DNA sequence classification Speech Recognition Handwriting Recognition Object orientation in coputer vision Game Playing Robot locomotionAlgorithm typesMachine learning algorithms are organized into different categories based on the desired outcome of

the algorithm. Some of the most common types include:

Supervised Learning: The algorithm generates a function that maps inputs to desired outputs. Oneof the standard formulation of the supervised learning task is the classification problems, in which

the learner s required to learn the behavior of a function that maps a vector [X1,X2,,XN] into one

of several classed by looking at several input-output examples of the function.

Unsupervised Learning: a model is fit to observations. (labeled examples are not available) It isdistinguished from supervised learning by the fact that there is no a priori output. In unsupervised

learning, a data set of input objects is gathered.

Semi-Supervised Learning: combines both labeled and unlabeled examples to generate anappropriate function or classifier.

Reinforcement Learning: the algorithm learns a policy of how to act given an observation of theworld. Every action has some impact in the environment, and the environment provides feedback

that guides the learning algorithm.

Learning to Learn: the algorithm learns its own inductive bias based on previous experience.

Machine learning involves adaptive mechanisms that enable computers to learn from experience,

learn by example and learn by analogy. Learning capabilities can improve the performance of anintelligent system over time. The most popular approaches to machine learning are artificial neural

networks and genetic algorithms.


4/21

Artificial eural etworkA neural network can be defined as a model of reasoning based on the human brain. The brain

consists of a densely interconnected set of nerve cells, or basic information-processing units, called

neurons. [3] The human brain incorporates nearly 10 billion neurons and 60 trillion connections,synapses, between them. By using multiple neurons simultaneously, the brain can perform its

functions much faster than the fastest computers in existence today.

Artificial neural network (ANN), often just called a "neural network" (NN), is a mathematical model

or computational model based on biological neural networks. It consists of an interconnected group of

artificial neurons and processes information using a connectionist approach to computation. [4]

Image - Biological Neural Net- A neuron consists of a cell body, soma, a number of fibers called dendrites, and a single

long fiber called the axon.

Bilogical eural etwork Artificial eural etwork

Soma Neuron

Dendrit Input

Axon OutputSynapse Weight


5/21

An artificial neural network consists of a number of very simple processors, also called neurons,

which are analogous to the biological neurons in the brain. The neurons are connected by weighted

links passing signals from one neuron to another. The output signal is transmitted through theneurons outgoing connection. The outgoing connection splits into a number of branches that transmit

the same signal. The outgoing branches terminate at the incoming connections of other neurons in the

network.[3]

In most cases an ANN is an adaptive system that changes its structure based on external or internal

information that flows through the network during the learning phase. In more practical terms neuralnetworks are non-linear statistical data modeling tools. They can be used to model complex

relationships between inputs and outputs or to find patterns in data. [4]

euron, the simplest computation element

Neural Network is a network of simple processing elements (neurons), which can exhibit complex

global behavior, determined by the connections between the processing elements and element

parameters. [4]

An artificial neuron, also called semi-linear unit, Nv neuron, binary neuron or McCulloch-Pitts

neuron, is an abstraction of biological neurons and the basic unit in an artificial neural network. TheArtificial Neuron receives one or more inputs (representing the one or more dendrites) and sums them

to produce an output (synapse). Usually the sums of each node are weighted, and the sum is passedthrough a non-linear function known as an activation or transfer function. [4]

Figure An artificial Neuron.


6/21

Basic structure of euron:

For a given artificial neuron, let there be n inputs with signalsx0 throughxn and weights w0 through

wn. The output of neuron is:

= ( .

)

Where (Phi) is the transfer function.

Activation functions of a neuron (Types of transfer functions)

Step Function: 1 00 < 0 Sign Function: +1 01 < 0

Sigmoid Function:

Linear Function: =

Perceptron, the simplest eural etwork

In 1958, Frank Rosenblatt introduced a training algorithm that provided the first procedure for training asimple ANN: a perceptron. The perceptron is the simplest form of a neural network. It consists of a singleneuron with adjustable synaptic weights and a hard limiter.

The operation of Rosenblatts perceptron is based on the McCulloch and Pitts neuron model. The modelconsists of a linear combiner followed by a hard limiter. The aim of the perceptron is to classify inputs, x1, x2,. . ., xn, into one of two classes, say A1 and A2. In the case of an elementary perceptron, the n-dimensionalspace is divided by a hyper-plane into two decision regions. The hyper-plane is defined by the linearlyseparable function:

.

= 0


7/21

Linear separately in the perceptron:

Graphs showing linearly separable logic functions

(Fig. 4) Since it is impossible to draw a line to divide the regions containing either 1 or 0, the XORfunction is not linearly separable

In the above graphs, the two axes are the inputs which can take the value of either 0 or 1, and the

numbers on the graph are the expected output for a particular input. Using an appropriate weight

vector for each case, a single perceptron can perform all of these functions.

However, not all logic operators are linearly separable. For instance, the XOR operator is not linearly

separable and cannot be achieved by a single perceptron. Yet this problem could be overcome byusing more than one perceptron arranged in feed-forward networks.

How does the perceptron learn its classification tasks?[3]

This is done by making small adjustments in the weights to reduce the difference between the actual and

desired outputs of the perceptron. The initial weights are randomly assigned, usually in the range [0.5, 0.5],and then updated to obtain the output consistent with the training examples.

If at iteration p, the actual output is Y(p) and the desired output is Yd(p), then the error is given by:e(p)=Yd(p)-Y(p) Where p=1,2,3,

Iteration p here refers to the pth training example presented to the perceptron. If the error, e(p), is positive, weneed to increase perceptron output Y(p), but if it is negative, we need to decrease Y(p).


8/21

The perceptron learning rule

The perceptron learning rule was first proposed by Rosenblatt in 1960. Using this rule we can derive the

perceptron training algorithm for classification tasks.

) + 1) = () + . .() () Where p = 1, 2, 3, . . . is the learning rate, a positive constant less than unity.

Perceptions training algorithm:

Step 1: Initialization

Set initial weights w1, w2,, wn and threshold to random numbers in the range [0.5, 0.5].If the error, e(p), is positive, we need to increase perceptron output Y(p), but if it is negative, we need to

decrease Y(p).

Step 2: Activation

Activate the perceptron by applying inputs x1(p),x2(p),, xn(p) and desired output Yd (p).Calculate the actual output at iteration p = 1

() = [ ().()

]

where n is the number of the perceptron inputs, andstep is a step activation function.

Step 3: Weight trainingUpdate the weights of the perceptron: () = () () +Where wi(p) is the weight correction at iteration p. The weight correction is computed by the delta rule:

() = . .() ()Step 4: Iteration

Increase iteration p by one, go back to Step 2 and repeat the process until convergence.

Two-dimensional plots of basic logical operations

A perceptron can learn the operations AND and OR,but not Exclusive-OR.


9/21

Example:


10/21

Multilayer neural networks

A multilayer perceptron is a feed-forward neural network with one or more hidden layers. The networkconsists of an input layer of source neurons, at least one middle or hidden layer of computational neurons, and

an output layer of computational neurons. The input signals are propagated in a forward direction on a layer-by-layer basis.

What does the middle layer hide?

A hidden layer hides its desired output. Neurons in the hidden layer cannot be observed through theinput/output behavior of the network. There is no obvious way to know what the desired output of the hiddenlayer should be. Commercial ANNs incorporate three and sometimes four layers, including one or two hiddenlayers. Each layer can contain from 10 to 1000 neurons. Experimental neural networks may have five or evensix layers, including three or four hidden layers, and utilize millions of neurons.


11/21

Back-propagation neural network

Learning in a multilayer network proceeds the same way as for a perceptron. A training set of input patterns is

presented to the network. The network computes its output pattern, and if there is an error or in other wordsa difference between actual and desired output patterns the weights are adjusted to reduce this error. In a

back-propagation neural network, the learning algorithm has two phases. First, a training input pattern ispresented to the network input layer. The network propagates the input pattern from layer to layer until the

output pattern is generated by the output layer. If this pattern is different from the desired output, an error iscalculated and then propagated backwards through the network from the output layer to the input layer. Theweights are modified as the error is propagated.

The back-propagation training algorithm

Step 1: InitializationSet all the weights and threshold levels of the network to random numbers uniformly distributed inside a smallrange:

2.4 , +2.4

Where Fi is the total number of inputs of neuron i in the network. The weight initialization is done on a neuron-by-neuron basis.


12/21

Step 2: ActivationActivate the back-propagation neural network by applying inputs x1(p), x2(p),, xn(p) and desired

outputs yd,1(p), yd,2(p),, yd,n(p).

(a) Calculate the actual outputs of the neurons in the hidden layer:= () [ )).()

[

Where n is the number of inputs of neuron j in the hidden layer, and sigmoid is the sigmoid activation function.

(b) Calculate the actual outputs of the neurons in the output layer:

= () [ ().()

[

where m is the number of inputs of neuron k in the output layer.

Step 3: Weight training

Update the weights in the back-propagation network propagating backward the errors associated with outputneurons.

(a) Calculate the error gradient for the neurons in the output layer:= () .() [1 .[() ()

() = , () ()Calculate the weight corrections: () = . .() ()Update the weights at the output neurons: ) + 1) = () () +

(b) Calculate the error gradient for the neurons in the hidden layer:= () .() 1 .)) ().()

Calculate the weight corrections: () = . .() ()

Update the weights at the hidden neurons: ) + 1) = () () +

Step 4: IterationIncrease iteration p by one, go back to Step 2 and repeat the process until the selected error criterionis satisfied.


13/21

Example:

As an example, we may consider the three-layer back-propagation network. Suppose that the network isrequired to perform logical operation Exclusive-OR. Recall that a single-layer perceptron

could not do this operation. Now we will apply the three-layer net.

Three-layer network for solving the Exclusive-OR operation

The effect of the threshold applied to a neuron in the hidden or output layer is represented by its weight, ,

connected to a fixed input equal to 1.

The initial weights and threshold levels are set randomly as follows:

w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = 1.2, w45 = 1.1, 3 = 0.8, 4 = 0.1 and 5 = 0.3.

Decision boundaries

(a) Decision boundary constructed by hidden neuron 3;

(b) Decision boundary constructed by hidden neuron 4;(c) Decision boundaries constructed by the complete three-layer network


14/21

The Neural Network Toolbox in MatlabThe neural network toolbox makes it easier to use neural networks in matlab. The toolbox consists of a set of functions

and structures that handle neural networks, so we do not need to write code for all activation functions, training

algorithms, etc. that we want to use!

The Neural Network Toolbox is contained in a directory called nnet. Type help nnetfor a listing of help topics.

The Structure of the Neural Network Toolbox

The toolbox is based on the network object. This object contains information about everything that concern the neural

network, e.g. the number and structure of its layers, the conectivity between the layers, etc. Matlab provides high-level

network creation functions, like newlin (create a linear layer), newp (create a perceptron) ornewff(create a feed-forwardbackpropagation network) to allow an easy construction of. As an example we construct a perceptron with two inputs

ranging from -2 to 2:

>> net = newp([-2 2;-2 2],1)

First the architecture parameters and thesubobject structuressubobject structures:

inputs: {1x1 cell} of inputslayers: {1x1 cell} of layersoutputs: {1x1 cell} containing 1 output

targets: {1x1 cell} containing 1 targetbiases: {1x1 cell} containing 1 bias

inputWeights: {1x1 cell} containing 1 input weight

layerWeights: {1x1 cell} containing no layer weights

are shown. The latter contains information about the individual objects of the network. Each layer consists of neurons

with the same transfer function net.transferFcn and net input function net.netInputFcn, which are in the case of

perceptrons hardlim and netsum. If neurons should have different transfer functions then they have to be arranged in

different layers. The parameters net.inputWeights and net.layerWeights specify among other things the applied learning

functions and their parameters. The next paragraph contains the training, initialization and performance functions. functions:

adaptFcn: 'trains'

initFcn: 'initlay'

performFcn: 'mae'

trainFcn: 'trainc'

The trainFcn and adaptFcn are used for the two different learning types batch learning and incremental or on-line

learning. By setting the trainFcn parameter you tell Matlab which training algorithm should be used, which is in our casethe cyclical order incremental training/learning function trainc. The ANN toolbox include almost 20 training functions.

The performance function is the function that determines how well the ANN is doing it's task. For a perceptron it is the

mean absolute error performance function mae. For linear regression usually the mean squared error performance function

mse is used. The initFcn is the function that initialized the weights and biases of the network. To get a list of the functions

that are available type help nnet. To change one of these functions to another one in the toolbox or one that you have

created, just assign the name of the function to the parameter, e.g.>> net.trainFcn = 'mytrainingfun';

The parameters that concerns these functions are listed in the next paragraph.


15/21

parameters:

adaptParam: .passesinitParam: (none)

performParam: (none)trainParam: .epochs, .goal, .show, .time

By changing these parameters you can change the default behavior of the functions mentioned above. The parameters you

will use the most are probably the components oftrainParam. The most used of these are net.trainParam.epochs which

tells the algorithm the maximum number of epochs to train, and net.trainParam.show that tells the algorithm how many

epochs there should be between each presentation of the performance. Type help train for more information.

The weights and biases are also stored in the network structure:weight and bias values:

IW: {1x1 cell} containing 1 input weight matrixLW: {1x1 cell} containing no layer weight matrices

b: {1x1 cell} containing 1 bias vector

The .IW(i,j) component is a two dimensional cell matrix that holds the weights of the connection between the input j andthe network layer i. The .LW(i,j) component holds the weight matrix for the connection from the network layerj to the

layeri. The cell array b contains the bias vector for each layer.


16/21

A Classification Task

Figure 1: Data set X projected to two dimensions.

As example our task is to create and train a perceptron that correctly classifies points sets belonging to three different

classes. First we load the data from the file winedata.mat

>> load winedata X C

Each row ofXrepresents a sample point whose class is specified by the corresponding element (row) in C. Further the

data is transformed into the input/output format used by the Neural Network Toolbox

>> P=X';

where P(:,i) is the ith point. Since we want to classify three different classes we use 3 perceptrons, each for the

classification of one class. The corresponding target function is generated by

>> T=ind2vec(C);

To create the perceptron layer with correct input range type

>> net=newp(minmax(P),size(T,1));


17/21

The difference between train and adapt

Both functions, train and adapt, are used for training a neural network, and most of the time both can be used for the same

network. The most important difference has to do with incremental training (updating the weights after the presentation of

each single training sample) versus batch training (updating the weights after each presenting the complete data set).

Adapt

First, set net.adaptFcn to the desired adaptation function. We'll use adaptwb (from 'adapt weights and biases'), which

allows for a separate update algorithm for each layer. Again, check the Matlab documentation for a complete overview of

possible update algorithms.

>> net.adaptFcn = 'trains';

Next, since we're using trains, we'll have to set the learning function for all weights and biases:

>> net.inputWeights{1,1}.learnFcn = 'learnp';>> net.biases{1}.learnFcn = 'learnp';

where learnp is the Perceptron learning rule. Finally, a useful parameter is net.adaptParam.passes, which is the

maximum number of times the complete training set may be used for updating the network:

>> net.adaptParam.passes = 1;

When using adapt, both incremental and batch training can be used. Which one is actually used depends on the format

of your training set. If it consists of two matrices of input and target vectors, like

>> [net,y,e] = adapt(net,P,T);

the network will be updated using batch training. Note that all elements of the matrix yare one, because the weights

are not updated until all of the trainings set had been presented.

If the training set is given in the form of a cell array

>> for i = 1:length(P), P2{i} = P(:,i); T2{i}= T(:,i); end

>> net = init(net);>> [net,y2,e2] = adapt(net,P2,T2);

then incremental training will be used. Notice that the weights had to be initialized before the network adaption was

started. Since adapttakes a lot more time then train we continue our analysis with second algorithm.

Train

When using train on the other hand, only batch training will be used, regardless of the format of the data (you can use

both). The advantage oftrain is that it provides a lot more choice in training functions (gradient descent, gradient descentw/ momentum, Levenberg-Marquardt, etc.) which are implemented very efficiently. So for static networks (no tapped

delay lines) usually train is the better choice.

We set


18/21

>> net.trainFcn = 'trainb';

for batch learning and

>> net.trainFcn = 'trainc';

for on-line learning. Which training parameters are present depends in general on your choice for the training function.

In our case two useful parameters are net.trainParam.epochs, which is the maximum number of times the complete

data set may be used for training, and net.trainParam.show, which is the time between status reports of the trainingfunction. For example,

>> net.trainParam.epochs = 1000;

>> net.trainParam.show = 100;

We initialize and simulate the network with

>> net = init(net);>> [net,tr] = train(net,P,T);

The trainings error is calculated with

>> Y=sim(net,P);>> train_error=mae(Y-T)

train_error =0.3801

So we see that the three classes of the data set were not linear seperable. The best time to stop learning would have

been

>> [min_perf,min_epoch]=min(tr.perf)

min_perf =

0.1948

min_epoch =703


19/21

Figure 2: Performance of the learning algorithm train over 1000 epochs.

A Simple logical problem

The task is to create and train a neural network that solves the XOR problem. XOR is a function that returns 1 when the

two inputs are not equal,

Construct a Feed-Forward etwork

To solve this we will need a feedforward neural network with two input neurons, and one output neuron. Because that the

problem is not linearly separable it will also need a hidden layer with two neurons.To create a new feed forward neural

network use the command newff. You have to enter the max and min of the input values, the number of neurons in each


20/21

layer and optionally the activation functions.

>> net = newff([0 1; 0 1],[2 1],{'logsig','logsig'});

The variable net will now contain an untrained feedforward neural network with two neurons in the input layer, two

neurons in the hidden layer and one output neuron, exactly as we want it. The [0 1; 0 1] tells matlab that the input

values ranges between 0 and 1. The 'logsig','logsig' tells matlab that we want to use the logsig function as activation

function in all layers. The first parameter tells the network how many nodes there should be in the input layer, hence

you do not have to specify this in the second parameter. You have to specify at least as many transfer functions as there

are layers, not counting the input layer. If you do not specify any transfer function Matlab will use the default settings.

First we construct a matrix of the inputs. The input to the network is always in the columns of the matrix. To create a

matrix with the inputs "1 1", "1 0", "0 1" and "0 0" we enter:

>> input = [1 1 0 0; 1 0 1 0]

input =1 1 0 0

1 0 1 0

Further we construct the target vector:

>> target = [0 1 1 0]

target =0 1 1 0

Train the etwork via Backpropagation

In this example we do not need all the information that the training algorithms shows, so we turn it of by entering:

>> net.trainParam.show=NaN;

Let us apply the default training algorithm Levenberg-Marquardt backpropagation trainlm to our network. An additional

training parameters is .min_grad. If the gradient of the performance is less than .min_gradthe training is ended. To train

the network enter:

>> net = train(net,input,target);

Because of the small size of the network, the training is done in only a second or two. Now we simulate the network, to

see how it reacts to the inputs:

>> output = sim(net,input)

output =

0.0000 1.0000 1.0000 0.0000

That was exactly what we wanted the network to output! Now examine the weights that the training algorithm has set

>> net.IW{1,1}

ans =11.0358 -9.5595


21/21

16.8909 -17.5570

>> net.LW{2,1}

ans =

25.9797 -25.7624

Graphical User InterfaceA graphical user interface has been added to the toolbox. This interface allows you to:

Creat networks Enter data into the GUI Initialize, train, and simulate networks Export the training results from the GUI to the command line workspace Import data from the command line workspace to the GUI

To open the Network/Data Manager window type nntool.

References:

1- Brief Discussion on Inductive/Deductive Profiling-(http://www.investigativepsych.com/inductive.htm)

2- An introduction to support vector machines and other kernel based methods3- Introduction to Knowledge based intelligent system, Negnevitsky, Pearson Negnevitsky, Pearson

Education, 2002

4- Wiki5- http://cse.stanford.edu/class/sophomore-college/projects-00/neural-networks/Neuron/index.html

machine_learning_ann_ramin_shamshiri.pdf

Documents