human brain the brain is a highly complex, non-linear, and parallel computer, composed of some 10 11...

Human Brain

• The brain is a highly complex, non-linear, and parallel computer, composed of some 1011 neurons that are densely connected (~104 connection per neuron). We have just begun to understand how the brain works...

• A neuron is much slower (10-3sec) compared to a silicon logic gate (10-9sec), however the massive interconnection between neurons make up for the comparably slow rate.

Human Brain

• Plasticity: Some of the neural structure of the brain is present at birth, while other parts are developed through learning, especially in early stages of life, to adapt to the environment (new inputs).

Biological Neuron

Biological Neuron

– dendrites: nerve fibres carrying electrical signals to the cell

– cell body: computes a non-linear function of its inputs

– axon: single long fiber that carries the electrical signal from the cell body to other neurons

– synapse: the point of contact between the axon of one cell and the dendrite of another, regulating a chemical connection whose strength affects the input to the cell.

Inspiration from Neurobiology

• A neuron: many-inputs / one-output unit

• output can be excited or not excited

• incoming signals from other neurons determine if the neuron shall excite ("fire")

• Output subject to attenuation in the synapses, which are junction parts of the neuron

A Neuron on a microchip

A Biological Neuron

A Photomicrograph of Neurons

Biological Neuro-Signal

Neural networks

• Neural network: information processing paradigm inspired by biological nervous systems, such as our brain

• Structure: large number of highly interconnected processing elements (neurons) working together

• Like people, they learn from experience (by example)

Neural networks

• Neural networks are configured for a specific application, such as pattern recognition or data classification, through a learning process

• In a biological system, learning involves adjustments to the synaptic connections between neurons

same for artificial neural networks (ANNs)

Where can neural network systems help

• when we can't formulate an algorithmic solution.

• when we can get lots of examples of the behavior we require.

‘learning from experience’

• when we need to pick out the structure from existing data.

Complicated Example: Categorising Vehicles

INPUT INPUT INPUT INPUT

• Input to function: pixel data from vehicle images

– Output: numbers: 1 for a car; 2 for a bus; 3 for a tank

OUTPUT = 3 OUTPUT = 2 OUTPUT = 1 OUTPUT=1

Artificial Neuron – Feed Forward

fjx

1x

niw

iw1

jiw

j

jjii xwI

)( ii Ify

(1) Summation

(2) TransferNeuron i

nx

iy

Transfer Functions

Artificial Neuron – Error Backward

fjx

1x

niw

iw1

jiw

Neuron i

nx

iy E

Perceptron

X1 X2 X3

YOutput layer

Input layer

Perceptron (cont.)

• Perceptron was introduced by Rosenblatt, 1957• He introduced the idea of training• A Perceptron is a linear threshold gate

Given a classification problem try to find a perceptron to fit:

otherwise

if 1 output

:that such threshold a and weights ofvector a Find

0

ii xw

iw

Perceptron – Feed Forward

fjx

1x

niw

iw1

jiw

j

jjii xwI

)( ii Ify

(1) Summation

(2) TransferNeuron i

nx

iy

Perceptron – Error Backward

fjx

1x

niw

iw1

jiw

Neuron i

nx

iy E

Perceptron : Weight Adjustment

The Perceptron learning rule:

If the perceptron gives the correct answer, do nothing

If the perceptron gives the wrong answer, adjust the weights and threshold “in the right direction”, so that it eventually gives the right answer.

Perceptron : Training Algorithm

),...,,,1(W

),...,,,w( weights Update22

),...,,(

),...,,(),...,,(12

:classifiedcorrectly not sample a is there While-2

),...,,,(w : weightsIntial -1

21

210

21

2121

210

n

n

iin

nn

n

xxx

www

xwxxxa

xxxaxxxd

www

Perceptron : Error value

set in thenot is),...,,(but

...when

set in the is),...,,(but

...when

given isanswer correct the

),...,,(),...,,(

21

2211

21

2211

2121

xxx

xwxwxw 1

xxx

xwxwxw 1

0

xxxaxxxd

n

nn

n

nn

nn

Perceptron : Learning Rate

• is the rate at which the training rule converges toward the correct solution.

• Typically <=1

• Too small an produces slow convergence.

• Too large of an can cause oscillations in the process.

Multi-layer Perceptron

Output layer

Hidden layer

Input layer

Layer SLayer S

Layer S+1Layer S+1

Layer S-1Layer S-1

Function Learning• Map categorisation learning to numerical problem

– Each category given a number– Or a range of real valued numbers (e.g., 0.5 - 0.9)

• Function learning examples– Input = 1,2,3,4 Output = 1,4,9,16– Here the concept to learn is squaring integers– Input = [1,2,3], [2,3,4], [3,4,5], [4,5,6]– Output = 1, 5, 11, 19– Here the concept is: [a,b,c] -> a*c - b

• The calculation is more complicated than in the first example• Neural networks:

– Calculation is much more complicated in general– But it is still just a numerical calculation

Example Perceptron

• Categorisation of 2x2 pixel black & white images

– Into “bright” and “dark”

• Representation of this rule:

– If it contains 2, 3 or 4 white pixels, it is “bright”

– If it contains 0 or 1 white pixels, it is “dark”

• Perceptron architecture:

– Four input units, one for each pixel

– One input unit: +1 for white, -1 for dark

Example Perceptron

• Example calculation: x1=-1, x2=1, x3=1, x4=-1– S = 0.25*(-1) + 0.25*(1) + 0.25*(1) + 0.25*(-1) = 0

• 0 > -0.1, so the output from the ANN is +1– So the image is categorised as “bright”

Worked Example

• Return to the “bright” and “dark” example• Use a learning rate of η = 0.1• Suppose we have set random weights:

Worked Example

• Use this training example, E, to update weights:

• Here, x1 = -1, x2 = 1, x3 = 1, x4 = -1 as before

• Propagate this information through the network:

– S = (-0.5 * 1) + (0.7 * -1) + (-0.2 * +1) + (0.1 * +1) + (0.9 * -1) = -2.2

• Hence the network outputs o(E) = -1

• But this should have been “bright”=+1

– So t(E) = +1

Calculating the Error Values

• Δ0 = η(t(E)-o(E))x0

= 0.1 * (1 - (-1)) * (1) = 0.1 * (2) = 0.2

• Δ1 = η(t(E)-o(E))x1

= 0.1 * (1 - (-1)) * (-1) = 0.1 * (-2) = -0.2

• Δ2 = η(t(E)-o(E))x2

= 0.1 * (1 - (-1)) * (1) = 0.1 * (2) = 0.2

• Δ3 = η(t(E)-o(E))x3

= 0.1 * (1 - (-1)) * (1) = 0.1 * (2) = 0.2

• Δ4 = η(t(E)-o(E))x4

= 0.1 * (1 - (-1)) * (-1) = 0.1 * (-2) = -0.2

Calculating the New Weights

• w’0 = -0.5 + Δ0 = -0.5 + 0.2 = -0.3

• w’1 = 0.7 + Δ1 = 0.7 + -0.2 = 0.5

• w’2 = -0.2 + Δ2 = -0.2 + 0.2 = 0

• w’3= 0.1 + Δ3 = 0.1 + 0.2 = 0.3

• w’4 = 0.9 + Δ4 = 0.9 - 0.2 = 0.7

New Look Perceptron

• Calculate for the example, E, again:– S = (-0.3 * 1) + (0.5 * -1) + (0 * +1) + (0.3 * +1) + (0.7 * -1) = -1.2

• Still gets the wrong categorisation– But the value is closer to zero (from -2.2 to -1.2)– In a few epochs time, this example will be correctly categorised

Boolean Functions

• Take in two inputs (-1 or +1)• Produce one output (-1 or +1)• In other contexts, use 0 and 1• Example: AND function

– Produces +1 only if both inputs are +1• Example: OR function

– Produces +1 if either inputs are +1• Related to the logical connectives from F.O.L.

Boolean Functions as Perceptrons

• Problem: XOR boolean function

– Produces +1 only if inputs are different

– Cannot be represented as a perceptron

– Because it is not linearly separable

Linearly Separable Boolean Functions

• Linearly separable:– Can use a line (dotted) to separate +1 and –1

• Think of the line as representing the threshold– Angle of line determined by two weights in perceptron– Y-axis crossing determined by threshold

Linearly Separable Functions

• Result extends to functions taking many inputs

– And outputting +1 and –1

• Also extends to higher dimensions for outputs

Typical Activation Functions

• F(x) = 1 / (1 + e -k ∑ (wixi) )

• Shown for

k = 0.5, 1 and 10

• Using a nonlinear function which approximates a linear threshold allows a network to approximate nonlinear functions

Learning performance

• Network architecture

• Learning method:

– Unsupervised

– Reinforcement learning

– Backpropagation

Unsupervised learning

• No help from the outside

• No training data, no information available on the desired output

• Learning by doing

• Used to pick out structure in the input:

• Clustering

• Reduction of dimensionality compression

• Example: Kohonen’s Learning Law

Competitive learning: example

• Example: Kohonen network

Winner takes all

only update weights of winning neuron

• Network topology

• Training patterns

• Activation rule

• Neighborhood

• Learning

Reinforcement learning

• Teacher: training data

• The teacher scores the performance of the training examples

• Use performance score to shuffle weights ‘randomly’

• Relatively slow learning due to ‘randomness’

Back propagation

• Desired output of the training examples

• Error = difference between actual & desired output

• Change weight relative to error size

• Calculate output layer error , then propagate back to previous layer

• Improved performance, very common!

Applications

• Prediction: learning from past experience

– pick the best stocks in the market

– predict weather

– identify people with cancer risk

• Classification

– Image processing

– Predict bankruptcy for credit card companies

– Risk assessment

Applications

• Recognition

– Pattern recognition: SNOOPE (bomb detector in U.S. airports)

– Character recognition

– Handwriting: processing checks

• Data association

– Not only identify the characters that were scanned but identify when the scanner is not working properly

Applications

• Data Conceptualization

– infer grouping relationshipse.g. extract from a database the names of those most likely to buy a particular product.

• Data Filtering

e.g. take the noise out of a telephone signal, signal smoothing

• Planning

– Unknown environments

– Sensor data is noisy

– Fairly new approach to planning

Strengths of a Neural Network

• Power: Model complex functions, nonlinearity built into the network

• Ease of use:– Learn by example– Very little user domain-specific expertise needed

• Intuitively appealing: based on model of biology, will it lead to genuinely intelligent computers/robots?

Neural networks cannot do anything that cannot be done using traditional computing techniques, BUT they can do some

things which would otherwise be very difficult.

General Advantages

• Advantages– Adapt to unknown situations– Robustness: fault tolerance due to network redundancy– Autonomous learning and generalization

• Disadvantages– Not exact– Large complexity of the network structure

• For motion planning?

Applications• Aerospace

– High performance aircraft autopilots, flight path simulations, aircraft control systems, autopilot enhancements, aircraft component simulations, aircraft component fault detectors

• Automotive– Automobile automatic guidance systems, warranty activity analyzers

• Banking– Check and other document readers, credit application evaluators

• Defense– Weapon steering, target tracking, object discrimination, facial recognition,

new kinds of sensors, sonar, radar and image signal processing including data compression, feature extraction and noise suppression, signal/image identification

• Electronics– Code sequence prediction, integrated circuit chip layout, process control, chip

failure analysis, machine vision, voice synthesis, nonlinear modeling

Applications• Financial

– Real estate appraisal, loan advisor, mortgage screening, corporate bond rating, credit line use analysis, portfolio trading program, corporate financial analysis, currency price prediction

• Manufacturing

– Manufacturing process control, product design and analysis, process and machine diagnosis, real-time particle identification, visual quality inspection systems, beer testing, welding quality analysis, paper quality prediction, computer chip quality analysis, analysis of grinding operations, chemical product design analysis, machine maintenance analysis, project bidding, planning and management, dynamic modeling of chemical process systems

• Medical

– Breast cancer cell analysis, EEG and ECG analysis, prosthesis design, optimization of transplant times, hospital expense reduction, hospital quality improvement, emergency room test advisement

Applications• Robotics

– Trajectory control, forklift robot, manipulator controllers, vision systems

• Speech– Speech recognition, speech compression, vowel classification, text to

speech synthesis• Securities

– Market analysis, automatic bond rating, stock trading advisory systems• Telecommunications

– Image and data compression, automated information services, real-time translation of spoken language, customer payment processing systems

• Transportation– Truck brake diagnosis systems, vehicle scheduling, routing systems

Properties of ANNs

• Learning from examples

– labeled or unlabeled

• Adaptivity

– changing the connection strengths to learn things

• Non-linearity

– the non-linear activation functions are essential

• Fault tolerance

– if one of the neurons or connections is damaged, the whole network still works quite well

Artificial Neuron Model

Neuroni Activation

x0= +1

x1

x2

x3

xm

wi1

wim

ai

Input Synaptic Output Weights

f

function

bi :Bias

Bias

• n

• ai = f (ni) = f (Swijxj + bi)

• i = 1

• An artificial neuron:

• - computes the weighted sum of its input and

• - if that value exceeds its “bias” (threshold),

• - it “fires” (i.e. becomes active)

Bias

• Bias can be incorporated as another weight clamped to a fixed input of +1.0

• This extra free variable (bias) makes the neuron more powerful.

• n

• ai = f (ni) = f (Swijxj)

• i = 0

Other Activation Functions

Different Network Topologies

• Single layer feed-forward networks

– Input layer projecting into the output layer

Input Output layer layer

Single layer network


• Multi-layer feed-forward networks

– One or more hidden layers. Input projects only from previous layers onto a layer.

Input Hidden Output layer layer layer

2-layer or1-hidden layerfully connectednetwork


• Recurrent networks

– A network with feedback, where some of its inputs are connected to some of its outputs (discrete time).

Input Output layer layer

Recurrentnetwork

How to Decide on a Network Topology?

– # of input nodes?

• Number of features

– # of output nodes?

• Suitable to encode the output representation

– transfer function?

• Suitable to the problem

– # of hidden nodes?

• Not exactly known

Examples:

Input Units

Hidden Units

Output units

weights

AutoassociationHeteroassociation

• hj=g(wji.xi) y1=g(wkj.hj)where g(x)= 1/(1+e )

x1 x2 x3 x4 x5 x6

h1 h2 h3

y1 k

j

i

wji’s

wkj’sg (sigmoid):

0

1/20

1

How is a function computed by a Multilayer Neural Network?

Typically, y1=1 for positive example and y1=0 for negative example

Learning in Multilayer Neural Networks

• Learning consists of searching through the space of all possible matrices of weight values for a combination of weights that satisfies a database of positive and negative examples (multi-class as well as regression problems are possible).

• Note that a Neural Network model with a set of adjustable weights defines a restricted hypothesis space corresponding to a family of functions. The size of this hypothesis space can be increased or decreased by increasing or decreasing the number of hidden units present in the network.

The Perceptron Training RuleOne way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modifying the perceptron weights whenever it misclassifies an example. This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly. Weights are modified at each step according to the perceptron training rule, which revises the weight associated with input according to the rule

Gradient Descent and Delta Rule

The delta training rule is best understood by considering the task of training an unthresholded perceptron; that is, a linear unit for which the output o is given by

In order to derive a weight learning rule for linear units, let us begin by specifying a measure for the training error of a hypothesis (weight vector), relative to the training examples.

BACKPROPAGATION Algorithm

EECP0720 Expert Systems – Artificial Neural Networks

Error Function

The Backpropagation algorithm learns the weights for a multilayer network, given a network with a fixed set of units and interconnections. It employs gradient descent to attempt to minimize the squared error between the network output values and the target values for those outputs. We begin by redefining E to sum the errors over all of the network output units

where outputs is the set of output units in the network, and tkd and okd are the target and output values associated with the kth output unit and training example d.

Architecture of Backpropagation

Backpropagation Learning Algorithm

Backpropagation Learning Algorithm (cont.)

Output

• The response function is normally nonlinear

• Samples include– Sigmoid

– Piecewise linear

xexf

1

1)(

xif

xifxxf

,0

,)(

Backpropagation Preparation

• Training SetA collection of input-output patterns that are used to train the network

• Testing SetA collection of input-output patterns that are used to assess network performance

• Learning Rate-ηA scalar parameter, analogous to step size in numerical integration, used to set the rate of adjustments

Network Error

• Total-Sum-Squared-Error (TSSE)

• Root-Mean-Squared-Error (RMSE)

patterns outputs

actualdesiredTSSE 2)(2

1

outputspatterns

TSSERMSE

*##

*2

Face Detection using Neural Networks

Neural

Network

Face Database

Non-Face Database

Training ProcessOutput=1, for face database

Output=0, for non-face database

Face

orNon-

Face?

Test

ing P

roc e

ss

Backpropagation Using Gradient Descent

• Advantages– Relatively simple implementation– Standard method and generally works well

• Disadvantages– Slow and inefficient– Can get stuck in local minima resulting in sub-optimal

solutions

Local Minima

Local Minimum

Global Minimum

Other Ways To Minimize Error

• Varying training data– Cycle through input classes– Randomly select from input classes

• Add noise to training data– Randomly change value of input node (with low

probability)• Retrain with expected inputs after initial training

– E.g. Speech recognition

Other Ways To Minimize Error

• Adding and removing neurons from layers– Adding neurons speeds up learning but may

cause loss in generalization– Removing neurons has the opposite effect

References

[1] L. Smith, ed. (1996, 2001), "An Introduction to Neural Networks", URL: http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html

[2] Sarle, W.S., ed. (1997), Neural Network FAQ, URL: ftp://ftp.sas.com/pub/neural/FAQ.html

[3] StatSoft, "Neural Networks", URL: http://www.statsoftinc.com/textbook/stneunet.html

[4] S. Cho, T. Chow, and C. Leung, "A Neural-Based Crowd Estimation by Hybrid Global Learning Algorithm", IEEE Transactions on Systems, Man and Cybernetics, Part B, No. 4. 1999.

http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html

ftp://ftp.sas.com/pub/neural/FAQ.html

http://www.statsoftinc.com/textbook/stneunet.html

human brain the brain is a highly complex, non-linear, and parallel computer, composed of some 10 11...

Documents