neural networks. pattern recognition n humans are very good at recognition. it is easy for us to...

Neural Networks

Pattern RecognitionPattern Recognition Humans are very good

at recognition. It is easy for us to identify the Dalmatian dog in the image

This recognition capability would be very difficult to implement in a program

Biological NeuronsBiological Neurons The human body is made up of

trillions of cells. Cells of the nervous system, called nerve cells or neuronsneurons, are specialized to carry "messages" through an electrochemical process. The human brain has approximately 100 billion neurons.

http://faculty.washington.edu/chudler/cells.html

From brain to NeuronsFrom brain to Neurons

Nerves Flash Nerves Flash

http://www.learner.org/channel/courses/biology/video/hires/a_neuro1.c.synapse.mov

http://www.onintelligence.org/forum/viewtopic.php?t=173&sid=b0e0b92b35f74c1cdc21adbce6302b60

Communications between NeuronsCommunications between Neurons

A Tour of our Neural CircuitA Tour of our Neural Circuit

Neurons come in many different shapes and sizes.Neurons come in many different shapes and sizes.

Some of the smallest neurons have cell bodies that are only 4 4 microns widemicrons wide. Some of the biggest neurons have cell bodies that are 100 microns wide100 microns wide.

1 micron is equal to one thousandth of a millimeter!

http://actu.epfl.ch

ANNANN Although neural networks are the natural form of

information processing mechanism each cell has very little processing power. They just accumulate information and pass it on.

The human body has in the order of 1010 neurons with 10100 connections between them. Their processing “cycle time” is in the order of 1 millisecond. Their power comes from the extent of the network and the fact that they are all operating in parallel.

In computer terms we can think of 10 billion In computer terms we can think of 10 billion simple CPUs processing 10 billion times 10 billion simple CPUs processing 10 billion times 10 billion variables once every millisecondvariables once every millisecond..

Modern computers and modern ANNs do not even begin to approach this level of complexity.

Blue Brain project Blue Brain project

A detailed, functional artificial human brain can be built A detailed, functional artificial human brain can be built within the next within the next 10 years10 years, a leading scientist has claimed., a leading scientist has claimed.

The Blue Brain project at Swizerland's EPFL (École Polytechnique Fédérale de Lausanne) was launched in 2005 and aims to reverse engineer the mammalian brain from laboratory data.

To make the model come alive, the team feeds the models and a few algorithms into a supercomputer.

they can show the brain a they can show the brain a picture - say, of a flower picture - say, of a flower - and follow the electrical - and follow the electrical activity in the machine.activity in the machine.

http://bluebrain.epfl.ch/

http://bluebrain.epfl.ch/page-59952-en.html

Artificial Neural NetworksArtificial Neural Networks

adaptive sets of interconnected simple biologically-inspired units which operate in some parallel and distributed mode to perform some common global task

Connectionism, PDP networks, Neural Computing, Empirical Learning Systems...

Neural nets are quantitative, numerical and don't require a knowledge engineer to extract expert information

Neural networks are inductive programs; they take in a great amount of information all at once and then draw a conclusion.

Artificial Neural NetworksArtificial Neural Networks

NN Features

Learning ability inherent parallelism distributed mode of operation simplicity of units’ behavior absence of centralized control

Components borrowed from Components borrowed from the biological neuronthe biological neuron soma axon dendrites synapse neuro-transmitters

Could receive excitatory/inhibitory nerve impulses

The computational architecture borrowed several components and functionalities from the biological neuron: SomaSoma

– cell body AxonAxon

– output link DendritesDendrites

– input link Synaptic Junction/SynapseSynaptic Junction/Synapse

– connect the axons of one neuron to various parts of other neurons

NeurotransmittersNeurotransmitters– chemicals/substances released by the presynaptic cells to

communicate with other neurons

– Nerve impulses through these connecting neurons can result in local changes in the potential in the cell body of the receiving neuron.

– ExcitatoryExcitatory – decreasing the polarization of the cell– InhibitoryInhibitory - increasing the polarization of the cell

the artificial neuron

Input Output(Activation)

connection weights

jo

ijw

ix

jneuron

)( jnetf

n

iijij wxnet

1

Activation/ Squashing/Transfer Activation/ Squashing/Transfer FunctionFunction

where

)(exp1

1)(netj

netf j

Activation FunctionsLogistic function

Hyperbolic tangent, etc.

Neural Network Models Perceptron Hopfield Networks Bi-Directional Associative Memory Self-Organizing Maps Neocognitron Adaptive Resonance Theory Boltzmann Machine Radial Basis Function Networks Cascade-Correlation Networks Reduced-Coulomb Energy Networks Multi-layer Feed-forward NetworkMulti-layer Feed-forward Network

Various NN Architectures

LearningLearning

Supervised – requires input-output pairs for training

Unsupervised – only inputs are given; it is able to organize itself in response to external stimuli

A Simple Kohonen Network

Input Nodes

Lattice

Node

Input Vector

4x4

Weight Vectors

Neural Network Architecture with Unsupervised Learning

Input: 3D , Output: 2D Vector quantisation

Unsupervised learning

Reduces dimensionality of information

Clustering of data

Topological relationship between data is maintained

SOM for Color ClusteringSOM for Color Clustering

Character RecognitionPattern Classification Network

…

…

…5 output nodes

16 hidden nodes

100 input nodes

C:\NN\FFPR

Multi-layer Feed-forward NetworkMulti-layer Feed-forward Network

What are the components of a Network?

How a network responds to a stimulus?

How a network learns or trains?

Neural Network ArchitectureNeural Network Architecture

Hidden Nodes

Input Nodes

Output Node

Layer 1

Layer 2

Layer 3

Layer 4

weight

Multi-layer Feed-forward Network

e.g. temperature, pressure, color, age, valve status, etc.

Three-layered Network (2-1-1) for solving the XOR Problem

Multi-layer Feed-forward Network SampleMulti-layer Feed-forward Network Sample

1 0

-3.29

-4.95-4.95

7.1

Output Layer

Hidden Layer

Input Layer

0.91

0.98

1

1

-2.76

10.9

7.1

Bias unit

circles represent neurons or units or nodes that are extremely simple analog computing devices

numbers within the circles represent the activation values of the units

there are three layers, the input layer that contains the values for x and y, a hidden layer that contains one node h, and an output unit that gives the value of the output value, z

-4.95-4.95

7.1

0.91

0.98

1

1

-2.76

10.9

7.1

-3.29

xx yy

zz

hh

bbhh

bbzz


Components of a Network


There are two other units present called bias units whose values are always 1.0

The lines connecting the circles represent weights and the number beside a weight is the value of the weight

Much of the time back propagation networks only have connections within adjacent layers; however, this one has two extra connections that go directly from the input units to the output unit.

In some problems, like xor these extra input -output connections make training the network much faster.

-4.95-4.95

7.1

0.91

0.98

1

1

-2.76

10.9

7.1

-3.29

xx yy

zz

hh

bbhh

bbzz

Networks are usually just described by the number of units in each layer so the network in the figure can be described as a 2- 1- 1 network with extra input -output connections, or 2- 1- 1 -x.

To compute the value of the output unit, z, we place values for x and y on the input layer units, say x = 1.01.0 and y =0.00.0, then propagate the signals up to the next succeeding layer.

-4.95-4.95

7.1

0.91

0.98

1

1

-2.76

10.9

7.1

-3.29

xx yy

zz

hh

bbhh

bbzz

For the hidden node h, find all lower level units connected to it. For each of the connections, multiply theweight attached to the link by the value of the unit andsum them all up.

Evaluating a NetworkEvaluating a Network

In some neural networks we might just leave the activation value of the unit to be 4.34. In this case we would say that we are using the linear activation function, however backprop is at its best when this value is passed to certain types of non linear functions.

-4.95-4.95

7.1

0.91

0.98

1

1-2.76

10.9

7.1

-3.29

xx yy

zz

hh

bbhh

bbzz

where s is the sum of the inputs to the neuron and v is the value of the neuron. Thus, with s = 4.34, v = 0.987.


The most commonly used non linear function is:

Standard Sigmoid:Standard Sigmoid:

Of course, 0.91 is not quite 1 but for this example it is close enough. When using this particular activation function for a problem where the output is supposed to be a 0 or 1, getting the output to within 0.1 of the target value is a very common standard.

With this particular activation function it is actually somewhat hard to get very close to 1 or 0 because the function only approaches 1 and 0 as the input to the function approaches

∞ and -∞

The other values the network computes for the xor function are:


Standard Sigmoid:Standard Sigmoid:

The formulas for computing the activation value for a neuron, j can be written more concisely as follows:

Let the weight between neuron j and neuron i be wij. Let the net input to neuron j be netj

Let the activation value for neuron j be oj. Let the activation function be the general function, f, then

where n is the number of units feeding into unit j and


Computing the activation value for a Neuron

Now, let’s look more closely to see how a Now, let’s look more closely to see how a Network is trainedNetwork is trained

Iterative minimization of error over training set Iterative minimization of error over training set 1. Put one of the training patterns to be learned on the

input units.2. Find the values for the hidden unit and output unit.3. Find out how large the error is on the output unit.4. Use one of the back-propagation formulas to adjust the

weights leading into the output unit.5. Use another formula to find out errors for the hidden

layer unit.6. Adjust the weights leading into the hidden layer unit via

another formula.7. Repeat steps 1 thru 6 for the second, third patterns,…

Backpropagation Training

We will now look at the formulas for adjusting the weights that lead into the output units of a back propagation network. The actual activation value of an output unit, k, will be ok and the

target for unit, k, will be tk . First of all there is a term in the formula for k , the error signal:

Training a NetworkTraining a Network

BACKPROPAGATION TRAINING

where f’ is the derivative of the activation function, f . If we use the usual activation function:

the derivative term is:

-4.95-4.95

7.1

0.91

0.98

1

1

-2.76

10.9

7.1

-3.29

xx yy

zz

hh

bbhh

bbzz

The formula to change the weight, wjk between the output unit, k, and unit j is:



where is some relatively small positive constant called the learning rate. With the network given, assuming that all

weights start with zero values, and with = 0.1 we have:

0.00.0

0.0

0.5

0.5

1

10.0

0.0

0.0

0.0

xx yy

zz

hh

bbhh

bbzz

The k subscript is for all the units in the output layer however in this example there is only one unit. In the example, then:



0.00.0

0.0

0.5

0.5

1

10.0

0.0

0.0

0.0

xx yy

zz

hh

bbhh

bbzz

The formula for computing the error j for a hidden unit, j, is:



The weight change formula for a weight, wij that goes between the hidden unit, j and the input unit i is essentially the same as before:

The new weights will be:

0.00.0

0.0

0.5

0.5

1

10.0

0.0

0.0

0.0

xx yy

zz

hh

bbhh

bbzz

The activation value for the output layer will now be 0.507031. If we now do the same for the other three patterns the output will be:



Sad to say but to get the outputs to within 0.1 requires 20,862 iterations, a very long time especially for such a short problem. Fortunately there are a large number of things that can be done to speed up the training and the time to do the XOR problem can be reduced to around 12

20 iterations or so. The very simplest thing to do is to increase the learning rate, . The

following table shows how many iterations are used for different values of .

Another unfortunate problem with backprop is that when the learning rate is too largewhen the learning rate is too large the training can failtraining can fail as it did in the case when = 3.0. Here, after 10,000 iterations the results were:



where the output for the last pattern is 1 not 0. The geometric interpretation of this problem is that when the network tries to make the error go down the network may get stuck in a valley that is not the lowest possible valley.

When backprop starts at point A and tries to minimize the error, you hope the process will stop when it hits the low point at B however you could get unlucky and hit the not so low point at C. The low point is a global minimum and the not so low point is a local minimum.

Training Neural Nets Given: Data set, desired outputs and a Neural Net with m

weights. Find a setting for the weights that will give good predictive

performance on new data. Estimate expected performance on new data.

1. Split data set (randomly) into three subsets: Training set – used for picking weights Validation set – used to stop training Test set (if possible) – used to evaluate performance

2. Pick random, small weights as initial values.3. Perform iterative minimization of error over training setiterative minimization of error over training set.4. Stop when error on validation set reaches a minimum (to

avoid overfitting).5. Repeat training (from Step 2) several times (to avoid local

minima).6. Use best weights to compute error on test set, which is the

estimate of performance on new data. Do not repeat training to improve this.

Notes on Training Notes on Training pick a set of random small weights random small weights as the initial

values of the weights. This reduces the chance of saturating any of the units initially.

we do not want to simply keep going until we reduce the training error to its minimum value. This is likely to overfitoverfit the training datatraining data.

stop when we get best the performance on the validation setvalidation set

Keeping the weights small is a strategy for reducing the size of the hypothesis space.

It also reduces the variance of the hypothesis since it limits the impact that any particular data point can have on the output.

Cross ValidationCross Validation

To evaluate performance of an algorithm as a whole (rather than a particular hypothesis)

Divide data into k subsets

k different times

- Train on k-1 of the subsets

- Test on the held-out subset

Return average test score over all k tests

Useful for deciding which class of algorithms to use on a particular data set.

Ok

O

j

Oi i (INPUT)

j (HIDDEN)

k (OUTPUT)

Wjk

Wij

Wik


A summary of all the formulas can be viewed in Backprop Formulas.

Duration of training 1 training cycle = (feedforward propagation +

retropropagation) Each training cycle is repeated for each

training pattern (e.g. aggtccattacgctatatgcgacttc)

1 Epoch = all training patterns have been subjected to one training cycle each

Neural Network training usually takes many training cycles (until Sum of Squared Errors is at an acceptable level)

(NOTE: Sum of Squared Errors is used to gauge the accuracy of the constructed Neural Network)

Sum of Squared Errors

This error is minimised during training.

n

i

m

k)

kO

k(TE

1 1

22

1

wherewhere T Tkk is the target output; OOkk is the actual output of the network;mm is the total number of output units;nn is the number of patterns in the validation set;

Root mean squared error

n is the number of patterns in the validation set m is the number of components in the output vector o is the output of a single neuron j t is the target for the single neuron j i denotes the input pattern

Mean squared error


Mean absolute error


Pattern Classification Error Metric

SOURCE: Artificial neural networks for civil engineers: fundamentals and applications By Nabil Kartam, Ian Flood, James H. Garrett, American Society of Civil Engineers. Expert Systems and Artificial Intelligence Committee

Notes on the General Error Metrics For analog or continuous output targets,

MAE or RMSE is desirable. MSE is essentially analogous to RMSE.

Simulation

Let’s see a working model of our Neural Network

ExerciseExercise

1 0

Output Layer

Hidden Layer

Input Layer

z

h

x y

1

1

Why= 0.1

Wzy= 0.1Wzx= 0.1Wzh= 0.1

Wzbz= 0.1

Whbh= 0.1

Whx= 0.1

bz

bh

Design Issues

Number of Nodes

Connection

Learning Paradigm

Adjustment of weights

Design Issues:

Input/Output Nodes – easy to determine

Hidden Nodes ?Number of Hidden Layers ?Bias Units ?

Error Signal Formulas

Error Signal Formulas

1. Standard Backpropagation ei = [di - ai] * [( 1 - ai ) * ai ]

2. Cross entropy error formula ei = log2 * [(di / ai) – ((1 - di )/(1 - ai))] * [( 1 - ai ) * ai ]

3. Normalized Exponential Error Formula ei = [a i - di]

where ai is the actual output signal di is the desired output signal

ei - error signal propagating back from each output unit

Multi-Layer Feed-Forward Neural NetworkMulti-Layer Feed-Forward Neural Network

Why do we need BIAS UNITS?

Apart from improving the speed of learning for some problems (XOR problem), bias units or threshold nodes are required for universal approximation. Without them, the feedforward network always assigns 0 output to 0 input. Without thresholds, it would be impossible to Without thresholds, it would be impossible to approximate functions which assign nonzero output to zero input.approximate functions which assign nonzero output to zero input. Threshold nodes are needed in much the same way that the constant polynomial ‘1’ is required for approximation by polynomials.

-4.95-4.95

7.1

0.91

0.98

1

1-2.76

10.9

7.1

-3.29

xx yy

zz

hh

bbhh

bbzz

Sigmoid Unit

Data Sets

• Split data set (randomly) into three subsets:

1. Training set – used for picking weights

2. Validation set – used to stop training

3. Test set – used to evaluate performance

2 inputs, 1 output unit, 2 hidden units

Training without a validation set

Training without a validation set

• Choose the number of units and

the stopping point (stop training before going too far astray) based on the network’s performance on validation datavalidation data

Notes on Training Notes on Training

Input Representation• All the signals in a neural net are [0, 1]. Input values

should also be scaled to this range (or approximately so) so as to speed training.

Input Representation• All the signals in a neural net are [0, 1]. Input values

should also be scaled to this range (or approximately so) so as to speed up training.

• If the input values are discrete, e.g. {A,B,C,D} or {1,2,3,4}, they need to be coded in unary form.

Output Representation

• A neural net with a single sigmoid output is aimed at binary classification.

Class is 0 if y < 0.5 and 1 otherwise.

• For multi-class problems• Can use one output per class (unary encoding)• There may be confusing outputs (two outputs > 0.5

in unary encoding)• More sophisticated method is to use special softmaxsoftmax units, which force outputs to sum to 1.

Target Value

• During training it is impossible for the outputs to reach 0 or 1 (with finite weights)

• Customary to use 0.10.1 and 0.90.9 as targets

• But, most termination criteria, e.g. small change in training or validation error will stop training before targets are reached.

ParametersParameters

Parameter Models Function

Learning rateLearning rate All Controls the step size for weight adjustments. Decreases over time for some types of NN.

MomentumMomentum Back propagation Smooth out the effect of weight adjustments over time.

Error toleranceError tolerance Back propagation Specifies how close the output value must be to the desired value before the error is considered

ActivationActivationfunctionfunction

All The function used at each neural processing unit to generate the output signal from the weighted average of inputs. Most common is the sigmoid function.

Character Recognition

Pattern Classification Network

…

…

…5 output nodes

16 hidden nodes

100 input nodes

C:\NN\FFPR

Training the Network

Tips for Training

Start with a relatively large error toleranceerror tolerance, and incrementally lower it to the desired level as training is achieved at each succeeding level. This usually results in fewer training iterations than starting out with the desired final error tolerance.

Training Tip #1

If the network fails to train at a certain error tolerance, try successively lowering the learning ratelearning rate.

Training Tip #2

Character RecognitionLearning Curve for Training the Pattern Classification Network

The chart depicts the “learning curvelearning curve”, as the error tolerance error tolerance was successively lowered through the values 0.01, 0.005, 0.0025, 0.001, 0.0005, 0.00025, and finally 0.0001.

The vertical axis shows the cumulative iterations needed to achieve the perfect “5 out of 5” training performance at each level of error tolerance.

The Learning Curve for Training the Pattern Classification Network

0

200

400

600

800

1000

1200

00.0010.0020.0030.0040.0050.0060.0070.0080.0090.01

Error Tolerance

Cu

mu

lati

ve

Ite

rati

on

s

Training the Network

Tips for Training

In a system to be used for a real-world application, such as character recognition, you would want the network to be able to handle not only pixel noise, but also size variance (slightly smaller letters), some rotation, as well as some variations in font style.

Training Tip #3

To produce such a robust network classifier, you would need to add representative samples to the training set. For example, samples of rotated versions of a character could be added to the training set, etc.

Weight Change = learning rate * input * error output + momentum_parameter * previous_weight_change


The Momentum Term

Warning!Warning! Setting the momentum term and learning rate too large can overshoot a good minimum as it takes large steps!

Smooth out the effect of weight adjustments over time.

Momentum term can be disabled by setting it to zero.

In general, the formula is:


The Momentum Term

Smooth out the effect of weight adjustments over time.

Momentum term can be disabled by setting it to zero.

Formula:

)*()*)1(*( oldoldnew WWWW

parameter momentum theis and rate learning theis where

More accurately,

Using Neural Networks in a Using Neural Networks in a Control ProblemControl Problem

Inverted Pendulum Problem

See FFBRM.EXE

Input: x, v, theta, angular velocity Output: Angle

…

…

… 1 output node

20 hidden nodes

5 input nodes

Using Neural Networks in a Control ProblemUsing Neural Networks in a Control Problem

Inverted Pendulum Problem (Dynamics of the Problem)Inverted Pendulum Problem (Dynamics of the Problem)

See FFBRM.EXE

Formulas:

Input: x, v, theta, angular velocity

Output: Angle

))((cos)3/4(

))](sin())('()())[(cos())(sin()(''

2

2

tlmml

ttlmtFttmgt

b

b

(θθ(t))]}/θ'(t)(θθ(t)l(θ(θ'(tm{F(t)x''(t) b cossin2

)(t

)(' t

= broom angle(with vertical) at time t (in radians)

= angular velocity

x(t) = cart position at time t ( in meters)

x’(t) = cart velocity

F(t) = force applied to cart at time t (Newtons)

m = combined mass of cart and broom (1.1 kg)

mb = mass of broom (0.1 kg)

l = length of broom (pivot point to center of mass, 0.5 meters)


The equations we’ve seen ignore the effects of friction. Control system failure occurs when the cart hits

either end of the track (at x= ±2.4 meters), or the

angle θ reaches ± πradians (±180 degrees)

For practical purposes, though, unrecoverable failure

occurs when the angle θ reaches ±12 degrees in magnitude so the Neural Network training will be restricted to this range.

State of the Cart-Broom System:

x, x’, theta θ, angular velocity θ-dot

Inverted Pendulum Problem (Limits of Network Training)


Euler’s method has only its simplicity to recommend it, and is normally not used when any amount of accuracy is required of a numerical solution. However, it is adequate for our needs.

State of the Cart-Broom System:

x, x’, theta θ, angular velocity θ-dot

SIMULATION ISSUES

)(')()( txhtxhtx

Euler’s MethodEuler’s method relates a variable and its derivative via the simple approximation equation:

How can we simulate the complete behaviour of the system now that we know how to derive all the state variables describing the system?What is the cart-broom state given any time t?


SIMULATION ISSUES

Euler’s Method

Here’s an appplication of Euler’s method

float f_theta (float frce,float th1,float th2) { float denom,numer,cost,sint; cost = cos (th1); sint = sin (th1); denom = four_thirds * m * l - mb * l * cost * cost; /* Always > 0 */ numer = m * g * sint -

cost * (frce + mb * l * th2 * th2 * sint); return numer/denom; }float f_x (float frce,float th1,float th2,float th3) { float cost,sint,term; cost = cos (th1); sint = sin (th1); term = mb * l * (th2 * th2 * sint - th3 * cost); return (frce + term)/m; }

))((cos)3/4(

))](sin())('()())[(cos())(sin()(''

2

2

tlmml

ttlmtFttmgt

b

b

mttttlmtFtx b /))]}(cos()('))(sin())('()({)('' 2

)(')()( txhtxhtx


SIMULATION ISSUES

Euler’s Method

Here’s an appplication of Euler’s method

)(')()( txhtxhtx

void new_broom_state (float frce,state_rec old_state,state_rec *new_state) { const float hh = 0.02; //seconds float th3; /* Euler's method applied to system of equations */ /* (not known for accuracy, but good enough here !) */ new_state->theta = old_state.theta + hh * old_state.theta_dot; th3 = f_theta (frce,old_state.theta,old_state.theta_dot); new_state->theta_dot = old_state.theta_dot + hh * th3; new_state->x_pos = old_state.x_pos + hh * old_state.x_dot; new_state->x_dot = old_state.x_dot +

hh * f_x (frce,old_state.theta,old_state.theta_dot,th3); return; }


Feed-Forward Controller System

Analogy to a Human ControllerSuppose that a human controller has no idea how the cart-broom system is going to respond to input forces. By randomly applying a force and observing whether or not that force helps in balancing the broom, eventually, the controller may notice that pushing the cart in the same direction that the broom is leaning will slide the cart back under the broom and tends to restore a vertical angle.

With enough repetitions, the controller learns how much force to supply and how often. An expert controller can anticipate which way the broom is going to fall and can apply a corrective force before the broom goes far in that direction.

How can we teach the Network to learn broom balancing?


Feed-Forward Controller System

Training the NetworkThe approach in teaching the network is similar to the process of teaching a human controller.

The networks will learn the dynamics of the cart-broom system by observing numerous repetitions of random-force applications at different broom states.

The trained network will then be a model of the cart-broom dynamics. Like the human expert, the network can predict the next broom statepredict the next broom state, and with this knowledge the correct force can the correct force can be appliedbe applied.

Run the cart-broom system through 100 random force applications and collect max-min data.

Collection of Training Data

All input data should be normalized. Each parameter will have its own [min, max] values.


Feed-Forward Controller System (BANG-BANG Control)

Running the Network

The controller operates by looking ahead, like an experienced controller would. The trained network emulates the broom dynamics, so that the controller can ask: “What will happen if I do nothing (zero force)?

The trained network answers this question by supplying the broom angle that would result on the next iteration if zero force were applied. Once the angle is predicted, the appropriate action can be taken.

Normalisation of Input ParametersNormalisation of Input Parameters

1. From the training data, calculate max, min values of each input parameter

),,,( bleeach variafor - max][min, xx

}!){000001.0(if errordelta

xDotxDotdelta minmax

2. xFactor

minmax

1xFactor

xx

3. xDotFactor

delta

1 xDotFactor


}!){000001.0(if errordelta

thetaDotthetaDotdelta minmax

4. thetaFactor

minmax

1r thetaFacto

thetatheta

5. thetaDotFactor

delta

1ctor thetaDotFa

6. forceFactor

tudeforceMagni*2

1r forceFacto


Normalisation of the x position input:

InputMinnormalisedInputRangenormalisedxFactorxxpos **)(isedxPosNormal min

Calculating the actual stateCalculating the actual state

Given: Normalised x position input, calculate the actual state

minminmax _*)(xPos xPosxPosnormalisedxPosxPos

Running the FF-Network Controller (Bang-bang)Running the FF-Network Controller (Bang-bang)

outAngle=(networkOutput – 0.5) * 2 * thetaFactor

if( |outAngle| < zeroTheta) { return 0;}if( outAngle > zeroTheta) { return systemForceIncrement;}

return -systemForceIncrement;


Slight variation to the Inverted Pendulum Problem

Input: x, v, theta, angular velocity, random Force Output: Angle

You could check all three possibilities with the look-ahead network: zero, positive and negative force. Then, you could pick the force input that resulted in the smallest output angle.

…

…

… 1 output node

20 hidden nodes

5 input nodes

This is more computationally expensive, and in tests it did not do any better than the “zero” look ahead strategy.


Alternative Solution to the Inverted Pendulum Problem

Output: Force, direction

…

…

2 output nodes (Force magnitude

& direction)

16 (4*4) hidden nodes

4 input nodes

Noise factor could be added during training (e.g. 0.05) to make the network more robust.

…

…… ……

Input: x, v, theta, angular velocity, random Force

Advantages of Neural Nets Learning is inherent Uses examples for training Broad response capability Very powerful in handling noise and

uncertainty Once the network is trained, execution is

very fast. Easy to manage and maintain

Limitations of Neural Nets

Functions like a black box.

(no explanation facility)

?OutputInput

Limitations of Neural Nets

Paucity of available examples could deter accuracy of learning

Learns solely from

examples

Numerous Applications

Character recognition Detection of stock market patterns that presage interesting moves Classification of objects that cause SONAR

returns Classifications of bioelectric signals (EKG &

EEG) into normal or pathological conditions Classification of lesions based on

photomicrographs

Let’s see the NN solving the character recognition problemLet’s see the NN solving the character recognition problem

Some observations (2004, MIT)(2004, MIT)

Although Neural Nets kicked off the current phase of interest in machine learning, they are extremely problematic..– Too many parameters (weights, learning rate, momentum,

etc.)– Hard to choose the architecture– Very slow to train– Easy to get stuck in local minima

Interest has shifted to other methods, such as support vector machines, which can be viewed as variants of perceptrons (with a twist or two).

HOAP-2 is a fast learnerfast learner because of how it was designed. HOAP, or Humanoid for Open Architecture Platform, represents a fundamentally different approach to creating humanoid robots. Instead of using the popular model-based approach to robot motion control, it harnesses the power of a neural network, processors that emulate the human brain and how it learns, to tackle movements and other tasks.

This dynamically reconfigurable neural network, the first of its kind developed by Fujitsu for humanoid robots, speeds up and simplifies the huge computational task of motion generation. The neural network can also be expanded with little effort and requires minimal software to run.

Feb 4th, 2005Feb 4th, 2005

Fujitsu Robot Project:Fujitsu Robot Project: HOAP SeriesHOAP Series

http://www.fujitsu.com/nz/about/rd/200506hoap-series.html

Activation FunctionsActivation Functions

Sigmoid, Tanh Non-linearities

-1

-0.5

0

0.5

1

-5 -3 -1 1 3 5

Sigmoid

Tanh

Linear

The sigmoid function varies between zero and one. This is inconvenient in calculation we cannot handle negative values. It is more convenient to use the tanh() function, which has exactly the same shape but which swings between +/- 1.

neural networks. pattern recognition n humans are very good at recognition. it is easy for us to...

Documents